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Preface to the Second Edition 


THE FAVORABLE RECEPTION OF THE FIRST EDITION SURPASSED 
the most daring anticipation and, in addition to an unexpected num- 
ber of users, the book seems to have found friends who read it merely 
for fun; it is most heartening that they range from pure mathematicians 
to pure amateurs. Although I cannot here express individual thanks to 
all readers to whom I am indebted for useful critical comments, their 
communications stimulated me during six years to think of improve- 
ments and to collect better examples and exercises. I hope that these 
will make for easier reading and teaching from the book. 

The general plan, as described in the preface to the first edition, re- 
mains unchanged. To accommodate the manifold needs of readers 
with divergent backgrounds, interests, and degrees of mathematical 
sophistication, it was necessary frequently to deviate from the main 
path. The exposition therefore does not always progress from the easy 
to the difficult; comparatively technical sections appear’at the begin- 
ning and easy sections in chapters XV and XVII. . Inexperienced 
readers should not attempt to follow many side lines lest they lose 
sight of the forest for too many trees. To facilitate orientation and 
the choice of desirable omissions, stars are used more systematically 
than in the first edition. The unstarred sections form a self-contained 
whole in which the starred sections are not used. 

A first introduction to the basic notions of probability is contained in 
chapters I, V, VI, IX; beginners should cover these with as few digressions 
as possible. Chapter II is designed to develop the student’s technique 
and probabilistic intuition; some experience in its contents is desirable, 
but it is not necessary to cover the chapter systematically: it may 
prove more profitable to return to the elementary illustrations as occa- 
sion arises at later stages. For the purposes of a first introduction, the 
restriction to discrete distributions should not be a serious handicap 
since the elementary theory of continuous distributions requires only 
a few words of supplementary explanation. 

From chapter IX an introductory course may proceed directly to 
chapter XI, considering generating functions as an example of more 


general transforms. Chapter XI should be followed by some applica- 
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tions in chapters XIII (recurrent events) or XII (chain reactions, in- 
finitely divisible distributions). Without generating functions it is 
possible to turn in one of the following directions: limit theorems and 
fluctuation theory (chapters VIII, X, III); stochastic processes (chap- 
ter XVII); random walks (chapter III and the main part of XIV). 
These chapters are almost independent of each other. The Markov 
chains of chapter XV depend conceptually on recurrent events, but 
they may be studied independently if the reader is willing to accept 
without proof the basic ergodic theorem. 


Space saved by streamlining made it possible to add new material 
and to integrate the old third chapter with chapter II. New emphasis 
is laid on waiting times, a topic now serving as a unifying thread 
throughout the book. This emphasis is reflected in the early intro- 
duction of waiting times in chapter II and in the several independent 
treatments of the first-passage times in random walks. 

Chapter III is entirely new. It illustrates the power of combinatorial 
methods by deriving in an elementary way important results previously 
obtained by advanced analytical tools. The results concerning fluctua- 
tions in coin tossing show that widely held beliefs about the law of 
large numbers are fallacious. These results are so amazing and so at 
variance with common intuition that even sophisticated colleagues 
doubted that coins actually misbehave as theory predicts. The record 
of a simulated experiment is therefore included in section 7. 

A new stress on the essential unity of recurrent events and Markov 
chains permitted improvements and simplifications, but at the cost of 
a change from the terminology of the first edition. I am deeply apolo- 
getic for the confusion which is bound to ensue. 

Great care has been taken to render the index usable, but it cannot 

serve as a Who's Who in probability: the proper balance is destroyed 
by references to all papers that chanced to lead, often indirectly, to 
the construction of an example or exercise. I regret that sometimes 
important contributions are quoted in an irrelevant context not indica- 
tive of their value. 
; This edition was Prepared under ideal working conditions without 
interruptions by routine duties. For this ease I must thank the Air 
Force Office of Scientific Research, Princeton University, and the 
stimulating hospitality of J. Wolfowitz. I have continued to benefit 
from the helpful criticism of J. L. Doob. The careful checking of 
manuscript and proofs by my wife has removed many errors and effects 
of chance. 


WILLIAM FELLER 
August 1957 


Preface to the First Edition 


IT WAS THE AUTHOR’S ORIGINAL INTENTION TO WRITE A BOOK ON 
analytical methods in probability theory in which the latter was to be 
treated as a topic in pure mathematics. Such a treatment would have 
been more uniform and hence more satisfactory from an aesthetic 
point of view; it would also have been more appealing to pure mathe- 
maticians. However, the generous support by the Office of Naval Re- 
search of work in probability theory at Cornell University led the 
author to a more ambitious and less thankful undertaking of satisfying 
heterogeneous needs. 

It is the purpose of this book to treat probability theory as a self- 
contained mathematical subject rigorously, avoiding non-mathematical 
concepts. At the same time, the book tries to describe the empirical 
background and to develop a feeling for the great variety of practical 
applications. This purpose is served by many special problems, nu- 
merical estimates, and examples which interrupt the main flow of the 
text. They are clearly set apart in print and are treated in a more 
picturesque language and with less formality. A number of special 
topics have been included in order to exhibit the power of general 
methods and to increase the usefulness of the book to specialists in 
various fields. To facilitate reading, detours from the main path are 
indicated by stars. The knowledge of starred sections is not assumed 
in the remainder. 

A serious attempt has been made to unify methods. The specialist 
will find many simplifications of existing proofs and also new results. 
In particular, the theory of recurrent events has been developed for the 
purpose of this book. It leads to a new treatment of Markov chains 
which permits simplification even in the finite case. 

The examples are accompanied by about 340 problems mostly with 
complete solutions. Some of them are simple exercises, but most of 
them serve as additional illustrative material to the text or contain 
various complements. One purpose of the examples and problems is 
to develop the reader’s intuition and art of probabilistic formulation. 


Several previously treated examples show that apparently difficult 
ix 
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problems may become almost trite once they are formulated in a 
natural way and put into the proper context. 

There is a tendency in teaching to reduce probability problems to 
pure analysis as soon as possible and to forget the specific character- 
istics of probability theory itself. Such treatments are based on a 
poorly defined notion of random variables usually introduced at the 
outset. This book goes to the other extreme and dwells on the notion 
of sample space, without which random variables remain an artifice. 

In order to present the true background unhampered by measura- 
bility questions and other purely analytic difficulties this volume is 
restricted to discrete sample spaces. This restriction is severe, but 
should be welcome to non-mathematical users. It permits the inclusion 
of special topics which are not easily accessible in the literature. At 
the same time, this arrangement makes it possible to begin in an ele- 
mentary way and yet to include a fairly exhaustive treatment of such 
advanced topics as random walks and Markov chains. The general 
theory of random variables and their distributions, limit theorems, 
diffusion theory, etc., is deferred to a succeeding volume. 

This book would not have been written without the support of the 
Office of Naval Research. One consequence of this support was 2 
fairly regular personal contact with J. L. Doob, whose constant criti- 
cism and encouragement were invaluable. To him go my foremost 
thanks. The next thanks for help are due to John Riordan, who 
followed the manuscript through two versions. Numerous corrections 
and improvements were suggested by my wife who read both the manu- 
Script and proof. : 

The author is also indebted to K. L. Chung, M. Donsker, and 
S. Goldberg, who read the manuscript and corrected various mistakes; 
the solutions to the majority of the problems were prepared by S. 
Goldberg. Finally, thanks are due to Kathryn Hollenbach for patient 
and expert typing help; to E. Elyash, W. Hoffman, and J. R. Kinney 
for help in proofreading. 


WILLIAM FELLER 
Cornell University " 


January 1950 
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INTRODUCTION 


The Nature 
of Probability Theory 


1. THE BACKGROUND 


Probability is a mathematical discipline with aims akin to those, 
for example, of geometry or analytical mechanics. In each field we 
must carefully distinguish three aspects of the theory: (a) the formal 
logical content, (b) the intuitive background, (c) the applications. 
The character, and the charm, of the whole structure cannot be ap- 
preciated without considering all three aspects in their proper relation. 


(a) Formal Logical Content 


Axiomatically, mathematics is concerned solely with relations among 
undefined things. This property is well illustrated by the game of 
chess. It is impossible to “define” chess otherwise than by stating a 
set of rules. The conventional shape of the pieces may be described 
to some extent, but it will not always be obvious which piece is in- 
tended for “king.” The chessboard and the pieces are helpful, but 
they can be dispensed with. The essential thing is to know how the 
pieces move and act. It is meaningless to talk about the “definition” 
or the “true nature” of a pawn or a king. Similarly, geometry does 
not care what a point and a straight line “really are.” They remain 
undefined notions, and the axioms of geometry specify the relations 
among them: two points determine a Tine, etc. These are the rules, 
&nd there is nothing sacred about them. We change the axioms to 
study different forms of geometry, and the logical structure of the 
several non-Huclidean geometries is independent of their relation to 
reality. Physicists have studied the motion of bodies under laws of 
attraction different from Newton’s, and such studies are meaningful 
even if-Newton’s law of attraction is accepted as true in nature. 
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(b) Intuitive Background 


In contrast to chess, the axioms of geometry and of mechanics refer 
to an existing intuitive background. In fact, geometrical intuition is 
so strong that it is prone to run ahead of logical reasoning. The extent 
to which logic, intuition, and physical experience are interdependent is 
à problem into which we need not enter. Certainly intuition can be 
trained and developed. The bewildered novice in chess moves cau- 
tiously, recalling individual rules, whereas the experienced player ab- 
sorbs a complicated situation at a glance and is unable to account ra- 
tionally for his intuition. In like manner mathematical intuition grows 
siillverperience, anit is possible to develop a natural feeling for con- 
cepts such as a four-dimensional space. 

Even the collective intuition of mankind appears to progress. New- 
ton’s notions of a field of force and of action at a distance and Max- 


well's concept of electromagnetic “waves” were at first decried as “‘un- 
thinkable” and “contrary to intuition.” Modern technology and radio 


in the homes have popularized these notions to such an extent that 


they form part of the ordinary vocabulary. Similarly, the mod ern 
student has no appreciation of the modes of thinking, the prejudices, 
and other difficulties against which the theory of probability had_to 
struggle when it was new. Nowadays newspapers report on samples 
of public opinion, and the magic of statistics embraces all phases of life 
to the extent that young girls watch the statistics of their chances to 
get married. Thus everyone has acquired a feeling for the meaning O f 
statements such as “the chances are three in five” Vague as ata 
this intuition serves as background and guide for the first step. It yil 
be developed as the theory progresses and acquaintance is made with 
more sophisticated applications. 


(c) Applications 
n practice identified 
flexible and variable 


that no general rules can be given. The notion of a rigid body T 
gid depends on 


the circumstances and the desired degree of approximation. Rubyer 


In applications, the abstract mathematical models se he 
and different models cañ describe the same empirical situation. -= 
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manner in which mathematical theories are applied does not depend on 
preconceived tleas; AT a Purposeful technique depenating on, and changing 
with, experience. A philosophical analysis of such techniques is a legiti- 
mate study, but it is not within the realm of mathematics, physics, or 
statistics. The philosophy of the foundations of probability must be 


divorced from mathematics and statistics, exactly as the discussion of 
our intuitive space concept is now divorced from geometry. 


2. PROCEDURE 


The history of probability (and of mathematics in general) shows a 
stimulating interplay of theory and applications; theoretical progress 
opens new fields of applications, and in turn applications lead to new 
problems and fruitful research. The theory of probability is now ap- 
plied in many diverse fields, and we require the flexibility of a general 
theory to provide appropriate tools for so great a variety of needs. 
' We must therefore withstand the temptation (and the pressure) to 
build the theory, its terminology, and its arsenal too close to one par- 
ticular sphere of interest. We wish instead to develop a mathematical 
theory in the established way which has proved so successful in geom- 
etry and mechanics. 

We shall start from the simplest experiences such as tossing a coin 
or throwing dice, where all statements have an obvious intuitive mean- 
ing. This intuition will be translated into an abstract model to be 
generalized gradually and by degrees. Illustrative examples will be 
provided to explain the empirical background of the several models 
and to develop the reader’s intuition, but the theory itself will be of a 
mathematical character. We shall no more attempt to explain the 
“true meaning” of probability than the modern physicist dwells on the 
“Teal meaning” of mass and energy or the geometer discusses the nature 
of a point. Instead, we shall prove theorems and show how they are 
applied. 

At the outset the purpose of the theory of probability was to describe 
the exceedingly narrow domain of experience connected with games of 
chance, the main effort being directed to the calculation of certain 
probabilities. In the opening chapters we too shall’calculate a few 
typical probabilities, but it should be borne in mind that numerical 
discover general laws and to construct satisfactory theoretical models, 
— Probabilities play for us the same role as masses in mechanics. The 
motion of the planetary system can be discussed without knowledge 
of the individual masses and without contemplating methods for their 
actual measurements. Even non-existent planetary systems may be 
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the object of a profitable and illuminating study. Similarly, practical 


and useful probability models may refer to non-observable worlds. For `~ 

example, billions of dollars have been invested in automatic telephone 

exchanges. These are based on simple probability models in which 

various possible systems-are compared. The theoretically best system 

is built and the others will never exist. In insurance, probability 

theory is used to calculate the probability of ruin; that is, the theo 

iS used to avoid certain undesirable situations, and consequently it ap- 
ies to situations that are not actually observed. Probability theory 

woul ective and useful even if not a single numerical value were 


accessible. 
3. “STATISTICAL” PROBABILITY 


The success of the modern mathematical theory of probability is 
bought at a price: the theory is limited to one particular aspect of 
“chance.” The intuitive notion of probability is connected with in- 
ductive reasoning and with judgments such as “Paul is probably 8 
happy man,” “Probably this book will be a failure,” “Fermat's con- 
jecture is probably false.” Judgments of this sort are of interest to 
the philosopher and the logician, and they are a legitimate object_of a 
mathematical theory! It must be Saree ee ver, that we are 
concerned not with modes of inductive reasoning but with something 


that might be called physical or statistical probability. seb ed 
we may characterize this concept by saying that our pro abilities do 


not refer to judgments but to possible outcomes of a conceptual experi- 
ment. Before we speak of probabilities, we must agree on aD idealized 


model of a particular conceptual experiment such as tossing & coin, 


sampling kangaroos on the moon, observing & particle under diffusion, 
counting the number of telephone calls. At the ontset we mast i 
on the possible outcomes of this experiment (our samp space) and the 
probabilities associated with them. This is analogous to the procedure 
i mechanics where fictitious models involving two, three, or seventeen 
mass points ao introduced, these points being devoid of individual 
properties. Similarly, in analyzing the coin tossing game we are nos 
concerned with the accidental circumstances of an actual experiment: 
the object of our theory are sequences (or arrangements) of symbols 
such as “head, head, tail, head, ....” There is no place in our system 


for speculations concerning the probability that the sun will rise 3 
morrow. Before speaking of it we should have to agree on an (idealize 


Annals of 


1B. O. Koopman, The axioms and algebra of intuitive probability, Bulletin 


Mathematics (2), vol. 41 (1940), pp. 269-292, and The bases of probability, 
of the American Mathematical Society, vol. 46 (1940), pp- 763-774. 
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model which would presumably run along the lines “out of infinitely 


many worlds one is selected at random. . . .” Little imagination is 
‘Tequired-to-construct such 4 model, but it appears both uninteresting 


and meaningless. 

The astronomer speaks of measuring the temperature at the center 
of the sun or of travel to Sirius. These operations seem impossible, 
and yet it is not senseless to contemplate them. By the same token, 
we shall not worry whether or not our conceptual experiments can be 
performed; we shall analyze abstract models. In the back of our minds 
we keep an intuitive interpretation of probability which gains opera- 
tional meaning in certain applications. We imagine the experiment 

rformed a great many times. An event with probability 0.6 should 
be expected, in the long run, to occur sixty times out of a hundred. 
This description is deliberately vague but supplies a picturesque intui- 
tive background sufficient for the more elementary applications. As 
the theory proceeds and grows more elaborate, the operational meaning 
andthe intuitive picture will become more concrete. = 

Le E 


4, SUMMARY 


We shall be concerned with theoretical models in which probabilities 
enter as free parameters in much the same way as masses in mechanics. 
They are applied in many and variable ways. The technique of appli- 
cations and the intuition develop with the theory. 

This is the standard procedure accepted and fruitful in other mathe- 
matical disciplines. No alternative has been devised which could con- 
ceivably fill the manifold needs and requirements of all branches of the 
rowing entity called probability theory and its applications. 

We may fairly lament that intuitive probability is insufficient for 
scientific purposes, but it is a historical fact. In example 1(6.b), we 
shall discuss random distributions of particles in compartments. The 
appropriate, or “natural,” probability distribution seemed perfectly 
clear to everyone and had been accepted without hesitation by physi- 
cists. It turned out, however, that physical particles are not trained 
in human common sense, and the “natural” (or Boltzmann) distribu- 
tion had to be given up for the Einstein-Bose distribution in some 
cases, for the Fermi-Dirac distribution in others. No intuitive argu- 
ment has been offered why photons should behave differently from 
protons and why they do not obey the “a priori” laws. Jf a justifica- 
tion could now be found, it would only show that intuition develops 
with theory. At any rate, even for applications freedom and flexibility 
are essential, and it would be pernicious to fetter the theory to fixed 
poles. 


6 INTRODUCTION 


It has also been claimed that the modern theory of probability is too 
abstract and too general to be useful. This is the battle cry once 
raised by practical-minded people against Maxwell’s field theory. The 
argument could be countered by pointing to the unexpected new appli- 
cations opened by the abstract theory of stochastic processes or to the 
new insights offered by the modern fluctuation theory which.once more 
belies intuition and is leading to a revision of practical attitudes. 
However, the discussion is useless; it is too easy to condemn. Only 
yesterday the practical things of today were decried as impractical, 
and the theories which will be practical tomorrow will always be 
branded as valueless games by the practical men of today. 


5. HISTORICAL NOTE 


The statistical, or empirical, attitude toward probability has been 
developed mainly by R. A. Fisher and R. von Mises. The notion of 
sample space ? comes from von Mises. This notion made it possible to 
build up a strictly mathematical theory of probability based on meas- 
ure theory. Such an approach has emerged gradually in the *twenties 
under the influence of many authors. An axiomatic treatment repre- 
senting the modern development was given by_A. Kolmogorov. 3 We 
shall follow this line, but the term axiom appears too solemn inasmuch 


ia ue present volume deals only with the simple case of discrete proba- 
ilities. 


* See his book, Wahracheinlichkeiterechnung, Leipzig and Wien, 1931, with refer- 
ences to his original papers dating back to about 1921. The German word is 
Merkmalraum (label space). 

* A. Kolmogoroff, Grundbegriffe der Wahrscheinlichkeitsrechnung, fase. 3 of vol. 
2 of Ergebnisse der Mathematik, Berlin, 1933. 


CHAPTER I 


The Sample Space 


1. THE EMPIRICAL BACKGROUND 


The mathematical theory of probability gains practical value and an 
intuitive meaning in connection with real or conceptual experiments 
such as tossing a coin once, tossing a coin 100 times, throwing three 
dice, arranging a deck of cards, matching two decks of cards, playing 
roulette, observing the life span of a radioactive atom or a person, 
selecting a random sample of people and observing the number of left- 
handers in it, crossing two species of plants and observing the pheno- 
types of the offspring; or with phenomena such as the sex of a newborn 
baby, the number of busy trunklines in a telephone exchange, the 
number of calls on a telephone, random noise in an electrical com- 
munication system, routine quality control of a production process, 
frequency of accidents, the number of double stars in a region of the 
skies, the position of a particle under diffusion. All these descriptions 
are rather vague, and, in order to render the theory meaningful, we 
have to agree on what we mean by possible results of the experiment or 
observation in question. a 

When a coin is tossed, it dges not necessarily fall heads or tails; it 
can roll away or stand on its edge. Nevertheless, we shall agree to 
SEE TN ie wnatag i ee 
ment. js convention simplifies the theory without affecting its £p- 
plicability. Idealizations of this type are standard practice. It isim- 
possible to measure the life span of an atom or a person without some 
error, but for theoretical purposes it is expedient to imagine that these 
quantities are exact numbers. The question then arises as to which 
numbers can actually represent the life span of a person. Is there a 
maximal age beyond which life is impossible, or is any age conceivable? 
We hesitate to admit that man can grow 1000 years old, and yet cur- 
rent actuarial practice admits no bounds to the possible duration of 
life, According to formulas on which modern mortality tables are 
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based, the proportion of men surviving 1000 years is of the order of 
magnitude of one in 10!°"—a number with 107 billions of zeros. This 
statement does not make sense from a biological or sociological point 
of view, but considered exclusively from a statistical standpoint it cer- 
tainly does not contradict any experience. There are fewer than 10'° 
people born in a century. To test the contention statistically, more 
than 10!” centuries would be required, which is considerably more 
than 10! lifetimes of the earth. Obviously, such extremely small 
probabilities are compatible with our notion of impossibility. Their 
use may appear utterly absurd, but it does no harm and is convenient 
in simplifying many formulas. Moreover, if we were seriously to dis- 
card the possibility of living 1000 years, we should have to accept the 
existence of a maximum age, and the assumption that it should be pos- 
sible to live x years and impossible to live z years and two seconds is 
as unappealing as the idea of unlimited life. 

Any theory necessarily involves idealization, and our first idealiza- 
tion concerns the possible outcomes of an “experiment” or “observa- 
tion.” If we want to construct an abstract model, we must at the 
outset reach a decision about what constitutes a possible outcome of 
the (idealized) experiment. 

For uniform terminology, the results of experiments or observations 
will be called events. Thus we shall speak of the event that of five coins 
tossed more than three fell heads. Similarly, the “experiment” of dis- 
tributing the cards in bridge! may result in the “event” that North 
has two aces. The composition of a sample (“two left-handers in a 
Bomala of 85”) and the result of a measurement (“temperature 1207 

seven trunklines busy”) will each be called an event. 

We shall distinguish between compound (or decomposable) and simple 
(or indecomposable) events. For example, saying that a throw with 
two dice resulted in “sum six” amounts to saying that it resulted in 
“(1, 5) or (2, 4) or (3, 3) or (4, 2) or (5, 1),” and this enumeration de- 
composes the event “sum six” into five simple events. Similarly, the 
event “two odd faces” admits of the decomposition “(1, 1) or (1, 3) or 
... or (5, 5)” into nine simple events. Note that if a throw results in 


1 Definition of bridge and poker. A deck of bridge cards consists of 52 cards af 


ranged in four suits of thirteen each. There are thirteen face values (2, 3, «++, 10, 
jack, queen, king, ace) in each suit. The four suits are called spades, clubs, hearts, 
diamonds. The last two are red, the first two black. Cards of the same face 
value are called of the same kind. For our purposes, playing bridge means dis- 
tributing the-cards to four players, to be called North, South, East, and West (or 
N, S, E, W, for short) so that each receives thirteen cards. Playing poker, by de- 
finition, means selecting five cards out of the pack. 
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(3, 3), then the same throw results also in the events “sum six” and 
“two odd faces”; these events are not mutually exclusive and hence 
may occur simultaneously. .As a second example consider the age of 
a person. Every particular value z represents a simple event, whereas 
the statement that a person is in his fifties describes the compound 
event that z lies between 50 and 60. In this way every compound 
event can be decomposed into simple events, that is to say, a com- 
pound event is an aggregate of certain simple events. 

If we want to speak about “experimen ” or “observations” in a 
theoretical way and without ambiguity, we must first agree on the 
simple events representing the thinkable outcomes; they define the ideal- 
ized experiment. It is usual to refer to these simple events as sample 
poinis, or points for short. By definition, every indecomposable result o 
the (idealized) experiment is represented by one, and only one, sample 
point. The aggregate of all sample points will be called the}sample 
acè. | All events connected with a given (idealized) experiment can 
be described in terms of sample points. 

Before formalizing these basic conventions, we proceed to discuss a 
few typical examples which will play a role further on. 


2. EXAMPLES 


(a) Distribution of three balls in three cells. Table 1 describes all pos- 
sible outcomes of the “experiment” of placing three balls into three 


cells. 


TABLE 1 
1. {abe| - | -} 10. {a | bel -} 19. { -|a | be} 
2. { - |abc| - } 11. { b'ac| -} 20. { -| b |a c} 
3. { - | - labe} 12. { clab| -} 21. { -| clab} 
4. tab | cl -} 13. {a |- | be} 22: {a | b| ce} 
5. fac] dl =) ei 23. {a, |) c[nor} 
6. { bela | -} 15. { cl - lab } 24.{ bla | ec} 
7. {ab| -| o 16. { - jab | c} 25. {bl cla } 
8. {ac| -| 5} 17. { -]ac| b} 26. { cla |b} 
9. { be| -la } 18. { - | bela } 27.{ cl bla } 


Each of these arrangements represents a simple event, that is, a 
oint. The event A “one cell is multiply occupied” is realized 
in the arrangements numbered 1-21, and we express this by saying 
that the event A is the aggregate of the sample points 1-21. Similarly, 
the event B “first cell is not empty” is the aggregate of the sample 
points 1, 4-15, 22-27. The event C defined by “both A and B occur” 
is the aggregate of the thirteen sample points 1, 4-15. In this particu- 


sample p 


DG 


M 
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lar example it so happens that each of the 27 points belongs to either 
A or B (or to both); therefore the event “either A or B or both occur” 
is the entire sample space and occurs with absolute certainty. The 
event D defined by “A does not occur” consists of the points 22-27 
and can be described by the condition that no cell remains empty. 
The event “first cell empty and no cell multiply occupied” is impos- 
sible (does not occur) since no sample point satisfies these specifica- 
tions. 

(b) Distribution of r balls in n cells. The more general case of r balls 
in 7 cells can be studied in the same manner, except that the number of 
possible arrangements increases rapidly with r and n. For r = 3 balls 
in n = 4 cells, the sample space contains already 64 points, and for 
r = n = 10 there are 10’ sample points; a complete tabulation would 
require some hundred thousand big volumes. 

We use this example to illustrate the important fact that the nature 


of the sample points is irrelevant for our theory. To us the sample 
space (together with the probability distribution defined in it) defines 


the idealized experiment. We use the picturesque language of balls 
and cells, but the same sample space admits of a great variety of dif- 
ferent practical interpretations. To clarify this point, and also for fur- 
ther reference, we list here a number of situations in which the intuitive 
background varies; all are, however, abstractly equivalent to the scheme of 
placing r balls into n cells, in the sense that the outcomes differ only in 

T verbal description, The appropriate assignment of probabilities is 
_ not the same in all cases and will be discussed later on, 


(6, 1). Birthdays. The possible configurations of the birthdays of 
peur correspond to the different arrangements of r balls in 
7 = 365 cells (assuming the year to have 365 days). 

C, 2). Accidents. Classifying r accidents according to the week- 
wi When they occurred is equivalent to placing r balls into n = 7 
cells. 

t oe In firing atn targets, the hits correspond to balls, the targets 
o cells. 

hg Sampling. Let a group of r people be classified according 
to, say, age or profession, The classes play the role of our cells, the 
people that of balls, 

C, 5). Irradiation in biology. When the cells in the retina. of the 
eye are exposed to light, the light particles play the role of balls, 
and the actual cells are the “cells” of our model. Similarly, in the 

study of the genetic effect of irradiation, the chromosomes correspond 
to the cells of our model and a-particles to the balls. 
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(b, 6). In cosmic ray experiments the particles hitting the Geiger 
counters represent the balls, and the counters function as cells. 

(b, 7). An elevator starts with r passengers and stops at n floors. 
The different arrangements of discharging the passengers are replicas 
of the different distributions of r balls in n cells. ` 

(b, 8). Dice. The possible outcomes of a throw with r dice corre- 
spond to placing r balls into n = 6 cells. When tossing a coin we are 
in effect dealing with only n = 2 cells. 

(b, 9). Random digits. The possible orderings of a sequence of r 
digits correspond to the distribution of r balls (= places) into ten 
cells called 0, 1, ..., 9. 

(b, 10). The sex distribution of r persons. Here we have n = 2 
cells and r balls. 

(b, 11). Coupon collecting. The different kinds of coupons repre- 
sent the cells; the coupons collected represent the balls. 

(b, 12). Aces in bridge. The four players represent four cells, and 
we have r = 4 balls. 

(b, 13). Gene distributions. Each descendant of an individual (per- 
son, plant, or animal) inherits from the progenitor certain genes. If 
a particular gene can appear in n forms Aj, ..., An, then the de- 
scendants may be classified according to the type of the gene. The 
descendants correspond to the balls, the genotypes Ai, ..., An to 
the cells. 

(b, 14). Chemistry. Suppose that a long chain polymer reacts with 
oxygen. An individual chain may react with 0, 1, 2, ... oxygen 
molecules. Here the reacting oxygen molecules play the role of balls 
and the polymer chains the role of cells into which the balls are put. 

(b, 15). Theory of photographic emulsions. A photographic plate 
is covered with grains sensitive to light quanta: a grain reacts if it 
is hit by a certain number, 7, of quanta. For the theory of black- 
white contrast we must know how many cells are likely to be hit by 
the r quanta. We have here an occupancy problem where the grains 
correspond to cells, and the light quanta to balls. (Actually the 
situation is more complicated since a plate usually contains grains 
of different sensitivity.) 

(b, 16). Misprints. The possible distributions of r misprints in 
the n pages of a book correspond to all the different distributions of 
r balls in n cells, provided r is smaller than the number of letters per 


page. 
(c) The case of indistinguishable balls. Let us return to example (a) 
and suppose that the three balls are not distinguishable. This means 
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that we no longer distinguish between three arrangements such as 4, 5, 
6, and thus table 1 reduces to table 2. The latter defines the sample 


TABLE 2 


oO whe 
re al 
| 


space of the ideal experiment which we call “placing three indistin- 
guishable balls-into three cells,” and a similar procedure applies to the 
case of r bails in 7 cells. 

Whether or not actual balls are in practice distinguishable is irrelevant 
for our theory. Even if they are, we may decide to treat them as indis- 
tinguishable. The aces in bridge [example (b, 12)] or the people in an 
elevator [example (b, 7)] certainly are distinguishable and yet it is often 
preferable to treat them as indistinguishable. The dice of example 
(b, 8) may be colored to make them distinguishable, but whether in 
discussing a particular problem we use the model of distinguishable or 
indistinguishable balls is purely a matter of purpose and convenience. 
The nature of a concrete problem may dictate the choice, but under 
any circumstances our theory begins only after the appropriate model 
has been chosen, that is, after the sample space has been defined. 

In the scheme above we have considered indistinguishable balls, but 
table 2 still refers to a first, second, third cell, and their order is 
essential. We can go a step further and assume that even the cells 
are indistinguishable (for example, the cell may be chosen at random 
without Tegard to its contents). With both balls and cells indistin- 
guishable, only three different arrangements are possible, namely 
ise =), {ee [| «| <3, (9 | * | «}. 

(@) Sampling. Suppose that a sample of 100 people is taken in order 
to estimate how many people smoke. The only property of the sample 


of interest in this connection is the number z of smokers; this may be — 


any integer between 0 and 100, In this case we may agree that 
our sample space consists of the 101 “points” 0, 1, ..-, 100. Pray 
particular sample or observation is completely described by stating 
the corresponding point z. An example of a compound event 1s the 
result that “the majority of the people sampled are smokers.” ane 
means that the experiment resulted in one of the fifty simple events 
51, 52, ..., 100, but it is not stated in which. Similarly, every property 
of the sample can be described in enumerating the corresponding cases 
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or sample points. For uniform terminology we speak of events rather 
than properties of the sample. Mathematically, an event is simply 
the aggregate of the corresponding sample points. 

(e) Sampling (continued). Suppose now that the 100 people in our 
sample are classified not only as smokers or non-smokers but also as 
males or females. The sample may now be characterized by a quad- 
ruple (M,, Fa, Mn, Fn) of integers giving in order the number of male 
and female smokers, male and female non-smokers. We can take for 
sample points the quadruples of integers lying between 0 and 100 and 
adding to 100. There are 176,851 such quadruples, and they constitute 
the sample space (cf. chapter II, section 5). The event “relatively 
more males than females smoke” means that in our sample the ratio 
M,/M, is greater than F,/F,. The point (73, 2, 8, 17) has this prop- 
erty, but (0, 1, 50, 49) has not. Our event can be described in principle 
by enumerating all quadruples with the desired property. 

(f) Coin tossing. For the experiment of tossing a coin three times, 
the sample space consists of eight points which may conveniently be 
represented by HHH, HHT, HTH, THA, HET, TAT, TTR, Por: 
The event A, “two or more heads,” is the aggregate of the first four 
points. The event B, “just one tail,” means either HHT, or HTH, or 
THH; we say that B contains these three points. 

(g) Ages of a couple. An insurance company is interested in the age 
distribution of couples. Let x stand for the age of the husband, y for 
the age of the wife. Each observation results in a number-pair (x, y). 
For the sample space corresponding to a single observation we take the 
first quadrant of the x, y-plane so that each point x > 0, y > 0 is a 
sample point. The event A, “husband is older than 40,” is represented 
by all points to the right of the line x = 40; the event B, “husband is 
older than wife,” is represented by the angular region between the 
z-axis and the bisector y = 2, that is to say, by the aggregate of points 
with z > y; the event C, “wife is older than 40,” is represented by the 
portion of the first quadrant above the line y = 40. For a geometric 
representation of the. joint age distributions of two couples we would 
require a four-dimensional space. 

(h) Phase space. In statistical mechanics, each possible “state” of a 
system is called a “point in phase space.” This is only a difference in 
terminology. The phase space is simply our sample space; its points 
are our sample points. 

3. THE SAMPLE SPACE. EVENTS 


It should be clear from the preceding that we shall never speak of 
probabilities except in relation to & given sample space (or, physically, 
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in relation to a certain conceptual experiment). We start with ihe notion 
of a sample space and its points; from now on they will be considered given. 
They are the primitive and undefined notions of the theory precisely as the 
notions of “points” and “straight line” remain undefined in an axio- 
matic treatment of Euclidean geometry. The nature of the sample 
points does not enter our theory. The sample space provides a model 
of an ideal experiment in the sense that, by definition, every thinkable 
outcome of the experiment is completely described by one, and only one, 
sample point. It is meaningful to talk about an event A only when it 
is clear for every outcome of the experiment whether the event A has 
or has not occurred. The collection of all those sample points repre- 
senting outcomes where A has occurred completely describes the event. 
Conversely, any given aggregate A containing one or more sample 
points can be called an event; this event does, or does not, occur accord- 
ing as the outcome of the experiment is, or is not, represented by a 
point of the aggregate A. We therefore define the word event to mean 
the same as an aggregate of sample points. We shall say that an event A 
consists of (or contains) certain points, namely those representing out- 
comes of the ideal experiment in which A occurs. 


Example. In the sample space of example (2.a) consider the event 
U consisting of the points number 1, 7, 13. This is a formal and 
straightforward definition, but U can be described in many equivalent 

, Ways. For example, U may be defined as the event that the following 
three conditions are satisfied: (1) the second cell is empty, (2) the ball 
a is in the first cell, (3) the ball b does not appear after c. Each of 
these conditions itself describes an event. The event U, defined by 
the condition (1) alone consists of points 1, 3, 7-9, 13-15. The event 
Uz defined by (2) consists of points 1, 4, 5, 7, 8, 10, 13, 22, 23, and the 
event Us defined by (3) contains the points 1-4, 6, 7, 9-11, 13, 14, 16, 
18-20, 22, 24, 25. The event U can also be described as the simul- 
taneous realization of all three events U1, U2, U3- 


The terms “sample point” and “event” have an intuitive apo 
but they refer to the notions of point and point set common We 
parts of mathematics. 

We have seen in the preceding example and in (2.a) 
can be defined in terms of two or more given even 
examples in mind we now proceed to introduce the not 
formal algebra of events (that is, algebra of point sets). 


that new events 
ts. With these 
ation of the 


ities = 


1.4] RELATIONS AMONG EVENTS 15 


4, RELATIONS AMONG EVENTS 


We shall now suppose that an arbitrary, but fixed, sample space © 
is given. 

Definition 1. We shall use the notation A = 0 to express that the 
event A contains no sample points (is impossible). The zero must be 
interpreted in a symbolic sense and not as the numeral. 

To every event A there corresponds another event defined by the 
condition “A does not occur.” It contains all points not contained in A. 


Definition 2. The event consisting of all points not contained in the 
event A will be called the complementary event (or negation) of A and will 
be denoted by A’. In particular, ©' = 0. 


A 
B 
B 
A 
B-AB 
A-AB 
C 
Fıcure 1 FIGURE 2 


Figures 1 ann 2. Illustrating relations among events. In Figure 1 the domain 
within heavy boundaries is the union A U BU C. The triangular (heavily shaded) 
domain is the intersection ABC. The moon-shaped (lightly shaded) domain is the 
intersection of B with the complement of A U C. 


With any two events A and B we can associate two new events de- 
fined by the conditions “both A and B occur” and “either A or B or 
both occur.” These events will be denoted by AB and A U B, respec- 
tively. The event AB contains all sample points which are common 
to A and B. If A and B exclude each other, then there are no points 
common to A and B and the event AB is impossible; analytically, this 
situation is described by the equation 


(4.1) AB=0 


which should be read “A and B are mutually exclusive.” The event 
AB’ means that both A and B’ occur or, in other words, that A but 
not B occurs. Similarly, A’B’ means that neither A nor B occurs. The 
event A U B means that at least one of the events A and B occurs; it 
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contains all sample points except those that belong neither to A nor 
to B. 

In the theory of probability we can describe the event AB as the 
simultaneous occurrence of A and B. In standard mathematical ter- 
minology AB is called the (logical) intersection of A and B. Similarly, 
A U B is the union of A and B. Our notion carries over to the case 
of events A, B, C, D, .... 


Definition 3. To every collection A, B, C, ... of evenis we define two 
new events as follows. The aggregate of the sample points which belong to 
all the given sets will be denoted by ABC ... and called the intersection * 
(or simultaneous realization) of A, B, C, .... The aggregate of sample 
points which belong to at least one of the given sets will be denoted by 
AUBUC... and called the union (or realization of at least one) of the 
given events. The events A,B,C, ... are mutually exclusive if no two 
have a point in common, that is, if AB = 0, AC = 0, -> BC =0,.«-- 


We still require a symbol to express the statement that A cannot 
occur without B occurring, that is, that the occurrence of A implies the 
occurrence of B. This means that every point of A is contained in B. 
Think of intuitive analogies like the aggregate of all mothers, which 
forms a part of the aggregate of all women: All mothers are women but 


not all women are mothers. 


Definition 4. The symbols A C B and B D A are equivalent and 
signify that every point of A is contained in B; they are read, respectively, 
“A implies B” and “B is implied by A”. If this is the case, we shall 
also write B — A instead of BA’ to denote the event that B but not A occurs. 


_ The event B — A contains all those points which are in B but not 
in A. With this notation we can write A’ = S — AandA-A=9. 


then the occur- 


Examples. (a) If A and B are mutually exclusive, 
Thus 


rence of A implies the non-occurrence of B and vice Verse. 
AB = 0 means the same as A C B’ and as BC A’. 

(b) The event A — AB means the occurrence of A but not of both 
Aand B. Thus A — AB = AB’. 

(c) In the example (2.9), the event AB means 
older than 40 and older than his wife; AB’ means 
40 but not older than his wife. AB is represented by 


that the husband is 
that he is older than 
the infinite trape- 


_ "The standard mathematical notation for the intersection of two oF pes a 
is AN Bor AN BNC, ete. This notation is more suitable for certain specific 
purposes and is to be adopted in the second volume. At present we use the nota- 
tion AB, ABC, etc., since it is less clumsy in print. 
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zoidal region between the z-axis and the lines r = 40 and y = z, and 
the event AB’ is represented by the angular domain between the lines 
az = 40 and y = z, the latter boundary included. The event AC means 
that both husband and wife are older than 40. The event A UC 
means that at least one of them is older than 40, and A U B means 
that the husband is either older than 40 or, if not that, at least older 
than his wife (in official language, “husband’s age exceeds 40 years or 
wife’s age, whichever is smaller”). 

(d) In example (2.a) let E; be the event that the cell number 7 is 
empty (here 7 = 1, 2,3). Similarly, let S;, D;, T;, respectively, denote 
the event that the cell number 7 is occupied simply, doubly, or triply. 
Then EE = Ts, and SıS2 C S3, and D,;Dz = 0. Note also that 
Tı D Ez, ete. The event Dı U Də U D; is defined by the condition 
that there exist at least one doubly occupied cell. 

(e) Bridge (cf. footnote 1). Let A,B,C, D be the events, respectively, 
that North, South, East, West have at least one ace. It is clear that 
at least one player has an ace, so that one or more of the four events 
must occur. Hence A U B U C U D = Gis the whole sample space. 
The event ABCD occurs if, and only if, each player has an ace. The 
event; “West has all four aces” means that none of the three events 
A, B, C has occurred; this is the same as the simultaneous occurrence 
of A’ and B’ and C” or the event A’B’C’. 

(f) In the example (2.g) we have BC C A; in words “if husband is 
older than wife (B) and wife is older than 40 (C), then husband is 
older than 40 (A).” How can the event A — BC be described in words? 


5. DISCRETE SAMPLE SPACES 


The simplest sample spaces are those containing only a finite num- 
ber, n, of points. If n is fairly small (as in the case of tossing a few 
coins), it is easy to visualize the space. The space of distributions of 
cards in bridge is more complicated. However, we may imagine each 
sample point represented on a chip and may then consider the collec- 
tion of these chips as representing the sample space. An event A (like 
“North has two aces?) is represented by a certain set of chips, the 
complement A’ by the remaining ones. It takes only one step from 
here to imagine a bowl with infinitely many chips or a sample space 
with an infinite sequence of points E1, Es, E3, .... 


Examples. (a) Let us toss a coin as often as necessary to turn up 
one head. The points of the sample space are then E, = H, Ey = TH, 
Ez = TTH, E4 = TTTH, etc. We may or may not consider as think- 
able the possibility that JZ never appears. If we do, this possibility 
should be represented by a point Ko. 
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(b) Three players a, b, c take turns at a game, such as chess, accord- 
ing to the following rules. At the start a and b play while c is out. 
The loser is replaced by c and at the second trial the winner plays 
against c while the loser is out. The game continues in this way until 
a player wins twice in succession, thus becoming the winner of the 
game. For simplicity we disregard the possibility of ties at the indi- 
vidual trials. The possible outcomes of our game are indicated by the 
following scheme: 


(*) “aa, acc, acbb, acbaa, achacc, acbacbb, acbacbaa, 
bb, bec, bcaa, beabb, bcabcc, bcabcaa, beabcabb, 


In addition, it is thinkable that no player ever wins twice in succession, 
which means that the play continues indefinitely according to one of 
the patterns 


(**) = acbacbacbacb ..., bcabcabcabca .--- 


The sample space corresponding to our ideal “experiment” is defined 
by (*) and («#) and is infinite. It is clear that the sample points can 
be arranged in a simple sequence by taking first the two points (**) 
and continuing with the points of (*) in the order aa, bb, ace, bec, --+- 
(This example is continued in problems 5 and 6; example V(2.); 
problem XV, 5.) 


Definition. A sample space is called discrete if it contains only finitely 
many points or infinitely many points which can be arranged into a simple 
sequence It), Ez, .... 


Not. every sample space is discrete. It is a known theorem (due t 
G. Cantor) that the sample space consisting of all positive numbers 13 
not discrete. We are here confrouted with a distinction familiar in 
mechanics. There it is usual first to consider discrete mass points 
with each individual point carrying a finite mass, and then to pare, p 
the notion of a continuous mass distribution, where each individus) 
point has zero mass. In the first case, the mass of a system =f pe 
simply by adding the masses of the individual points; m. the EE 
case, masses are computed by integration over mass densiti 
similarly, the probabilities of events in discrete sample sharen 
tained by mere additions, whereas in other spaces integrations 
essary. Except for the technical tools required, t i 
difference between the two cases. In order to present asta! zig ke up 
considerations unhampered by technical difficulties, W ù pain ] case 
only discrete sample spaces. Tt will be seen that even this speci® 
leads to many interesting and important results. 


In this volume we shall consider only discrete sample space 
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6. PROBABILITIES IN DISCRETE SAMPLE SPACES: 
PREPARATIONS 


The probabilities of the various events are numbers of the same 
nature as distances in geometry or masses in mechanics. The theory 
assumes that they are given but need assume nothing about their actual 
numerical values or how they are measured in practice. Some of the 
most important applications are of a qualitative nature and independ- 
ent of numerical values; the general conclusions of the theory are ap- 
plied in many ways exactly as the theorems of geometry serve as à 
hasis for physical theories and engineering applications. In the rela- 
tively few instances where numerical values for probabilities are re- 
quired, the methods of procedure vary as widely as do the methods of 
determining distances. There is little in common in the practices of 
the carpenter, the practical surveyor, the pilot, and the astronomer 
when they measure distances. In our context, we may consider the 


‘diffusion constant, which is a notion of the theory of probability. To 


find its numerical value, physical considerations relating it to other 
theories are required; a direct measurement is impossible. By contrast, 
mortality tables are constructed from rather crude observations. In 
most actual applications the determination of probabilities, or the com- 
parison of theory and observation, requires rather sophisticated statis- 
tical methods, which in turn are based on a refined probability theory. 
In other words, the intuitive meaning of probability is clear, but only 
as the theory proceeds shall we be able to see how it is applied. All 
possible “definitions” of probability fall far short of the actual practice. 

When tossing a “good” coin we do not hesitate to associate prob- 
ability 4 with either head or tail. This amounts to saying that when 
a coin is tossed n times all 2” possible results have the same probability. 
From a theoretical standpoint,.this is a convention. Frequently, it has 
been contended that this convention is logically unavoidable and the 
only possible one. Yet there have been philosophers and statisticians 
defying the convention and starting from contradictory assumptions 
(uniformity or non-uniformity in nature). It has also been claimed 
that the probabilities $ are due to experience. As a matter of fact, 
whenever refined statistical methods have been used to check on actual 
coin tossing, the result has been invariably that head and tail are not 
equally likely. And yet we stick to our model of an “ideal” coin, even 
though no good coins exist. We preserve the model not merely for its 
logical simplicity, but essentially for its usefulness and applicability. 
In many applications it is sufficiently accurate to describe reality. 
More important is the empirical fact that departures from our scheme 
are always coupled with phenomena such as an eccentric position of 
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the center of gravity. In this way our idealized model can be extremely 
useful even if it never applies exactly. For example, in modern statis- 
tical quality control based on Shewhart’s methods, idealized probability 
models are used to discover “‘assignable causes” for flagrant departures 
from these models and thus to remove impending machine troubles and 
process irregularities at an early stage. 

Similar remarks apply to other cases. The number of possible dis- 
tributions of cards in bridge is almost 10%. Usually we agree to con- 
sider them as equally probable. For a check of this convention more 
than 10°° experiments would be required—thousands of billions of 
years if every living person played one game every second, day and 
night. However, consequences of the assumption can be verified ex- 
perimentally, for example, by observing the frequency of multiple aces 
in the hands at bridge. It turns out that for crude purposes the ideal- 
ized model describes experience sufficiently well, provided the card 
shuffling is done better than is usual. It is more important that the 
idealized scheme, when it does not apply, permits the discovery of 
“assignable causes” for the discrepancies, for example, the reconstruc- 
tion of the mode of shuffling. These are examples of limited impor- 
tance, but they indicate the usefulness of assumed models. More in- 
teresting cases will appear only as the theory proceeds. 


Examples. (a) Distinguishable balls. In example (2.a) it appears 
natural to assume that all sample points are equally probable, that is, 
that each sample point has probability wy. We can start from this 
definition and investigate its consequences. Whether or not our model 
will come reasonably close to actual experience will depend on the type 
ai phenomena to which it is applied. In some applications the assump- 
tion of equal probabilities is imposed by physical considers tenes a 
others it is introduced to serve as the simplest model for a ee 
orientation, even though it quite obviously represents only a crude 
first approximation (e.g., consider the examples (2-b; 1), birthdays; 
(2.b, 7), elevator problem; or (2.b, 11) coupon collecting). 

(b) Indistinguishable balls: Bose-Einstein statistics. ` We now-turn zn 
the example (2.c) of three indistinguishable balls in three cells. Pe 
possible to argue that the actual physical experiment is unaffecte a 
our failure to distinguish between the balls; physically there E hs 
different possibilities, even though cnly ten different forms are is T 
guishable. This consideration leads us to attribute the following Pr 
abilities to the ten points of table 2. 

Point number: 1 2 3 4 5 6 7 8 s 7 
1 1 Le t $39 $- 7 


SE KG ree FGN $a $ 
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It must be admitted that for most applications listed in example (2.b) 
this argument appears sound and the assignment of probabilities rea- 
sonable. Historically, our argument was accepted for a long time with- 
out question and served in statistical mechanics as the basis for the 
derivation of the Mazwell-Boltzmann statistics for the distribution of r 
balls in n cells. The greater was the general surprise when Bose and 
Einstein showed that certain particles are subject to the Bose-Einstein 
statistics (for details see chapter II, section 5). In our case with 
r = n = 3, the Bose-Einstein model attributes probability J to each 
of the ten sample points. 

This example will show that different assignments of probabilities 
are compatible with the same sample space and will illustrate the intri- 
cate interrelation between theory and experience. In particular, it 
teaches us not to rely too much on a priori arguments and to be pre- 
pared to accept new and unforeseen schemes. 

(c) Coin tossing. A frequency interpretation of the postulate of 
equal probabilities requires records of actual experiments. Now in 
reality every coin is biased, and it is possible to devise physical experi- 
ments which come much closer to the ideal model of coin tossing than 


TABLE 3 


Trials 
number Numbers of heads Total 


0- 1,000 | 54 46 53 55 46 54 41 48 51 53! 501 
= 2,000 | 48 46 40 53 49 49 48 54 53 45 | 485 
- 3,000 | 43 52 58 51 51 50 52 50 53 49] 509 
- 4,990 | 58 60 54 55 50 48 47 57 52 55 | 536 
- 5,000 | 48 51 51 49 44 52 50 46 53 41 | 485 
- 6,000 | 49 50 45 52 52 48 47 47 47 51 | 488 
= 7,000 | 45 47 41 51 49 59 50 55 53 50| 500 
- 8.900 | 53 52 46 52 44 51 48 51 46 54 |497 
— 9,000 | 45 47 46 52 47 48 59 57 45 48 | 494 
-10,000 | 47 41 51 48 59 51 52 55 39 41 |484 


real coins ever do. To give an idea of the fluctuations to be expected, 
we give the record of such a simulated experiment corresponding to 
10,000 trials with a coin.? Table 3 contains the number of occurrences 


3 The table actually records the frequency of even digits in a section of 4 Million 
Random Digits with 100,000 Normal Deviates, by The Rann Corporation, The Free 
Press, Glenese, Illinois, 1955. 


S.C.E.R.T., West Benga! 


Date... l.A 2.70 a 
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of “heads” in a series of 100 experiments each corresponding to a se- 
quence of 100 trials with a coin. The grand total is 4979. Looking at 
these figures the reader is very probably left with a vague feeling of: 
So what? The truth is that a more advanced theory is necessary to 
judge to what extent such empirical data agree with our abstract model. 
(Incidentally, we shall return to this material in chapter III, section 7.) 


7. THE BASIC DEFINITIONS AND RULES 


Fundamental Convention. Given a discrete sample space S with 
the sample points E,, E2, ..., we shall assume that with each point E; there 
is associated a number, called the probability of E; and denoted by P{H;}. 
It is to be non-negative and such that 


(7.1) PIEJ + PiE} +...=1. 


Note that we do not exclude the possibility that a point has prob- 
ability zero. This convention may appear artificial but is necessary 
to avoid complications. In discrete sample spaces probability zero 1s 
in practice interpreted as an impossibility, and any sample point known 
to have probability zero can, with impunity, be eliminated from the 
sample space. However, frequently the numerical values of the prob- 
abilities are not known in advance, and involved considerations are re- 
quired to decide whether or not a certain sample point has positive 
probability. 


proba ition. The probability P{A} of any event A is the sum of the 
robabilities of all sampie points in it. 
probability of the 


The fundamental equation (7.1) states that the 
ollows that for any 


entire sample space © is unity, or P{S} = 1. Itf 
event A 


(7.2) 0<P{A} <1. 


Consider now two arbitrary events A; and Az. To aghiu k 
probability P{A, U Ay} that either Ay or Az or both occur, we have 
to add the probabilities of all sample points contained either 1n a X 
in A», but each point is to be counted only once. We have, therefore, 


(7.3) P{A, U Ag} < P{Ai} + P42}. 


Now, if E is any point contained both in Ai and in Az er 
occurs twice in the right-hand member but only once "a pee nount 
member, Therefore, the right side exceeds the left side by the am 
P{A,Ao}, and we have the simple but important 
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Theorem. For any two events A; and Ag the probability that either 
A, or Ag or both occur is given by 


(7.4) P{A, U A2} = P{Ay} + P{A2} — P{Ai Ap}. 


If Aå = 0, that is, if Ai and As are mutually exclusive, then (7.4) 
reduces to 


(7.5) P{A,; U Ag} = P{Aj} + P{Ad}. 


Example. A coin is tossed twice. For sample space we take the 
four points HH, HT, TH, TT, and associate with each probability L, 
Let A; and Az be, respectively, the events “head at first and second 
trial.” Then A, consists of HH and HT, and As of TH and HH. Fur- 
thermore A = A, U Ag contains the three points HH, HT, and TH, 
whereas A; Ao consists of the single point HH. Thus 


P{A, U As} =F + R-2H=% 


The probability P{A, U A2 U...U An} of the realization of at least 
one among n events can be computed by a formula analogous to (7.4); 
this will be taken up in chapter IV, section 1. Here we note only that 
the inequality (7.3) obviously holds in general. Thus for arbitrary events 
Ay, Ag, ... the inequality 


(7.6) P{A, U A2 U...} < P{Ay} + P{Ao} +... 

holds. In the special case where the events A, Ao, ... are mutually 
exclusive, we have 

(7.7) PfA, U Ag U...} = P{Ai} + P{A2} +.... 


Occasionally (7.6) is referred to as Boole’s inequality. 

We shall first investigate the simple special case where the sample 
space has a finite number, N, of points each having probability 1/N. 
In this case, the probability of any event A equals the number of points 
in A divided by N. In the older literature, the points of the sample 
space were called “cases,” and the points of A “favorable” cases (favor- 
able for A). Jf all points have the same probability, then the prob- 
ability of an event A is the ratio of the number of favorable cases to 
the total number of cases. Unfortunately, this statement has been 
much abused to provide a “definition” of probability. It is often con- 
tended that in every finite sample space probabilities of all points are 
equal. This is not so. For a single throw of an untrue coin, the sample 
space still contains only the two points, head and tail, but they may 
have arbitrary probabilities p and q, with p + g = 1. A newborn baby 
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is a boy or girl, but in applications we have to admit that the two 
possibilities are not equally likely. A further counterexample is pro- 
vided by (6.b). The usefulness of sample spaces in which all sample 
points have the same probability is restricted almost entirely to the 
study of games of chance and to combinatorial analysis. 


8. PROBLEMS FOR SOLUTION 


1. Among the digits 1, 2, 3, 4, 5 first one is chosen, and then a second selection 
is e among the remaining four digits. Assume that all twenty possible re- 
sults have the same probability. Find the probability that an odd digit will 
be selected (a) the first time, (b) the second time, (c) both times. 

2. In the sample space of example (2.a) attach equal probabilities to all 27 
points. Using the notation of example (4.d), verify formula (7.4) for the two 
events A; = Sı and Az = S2. How many points does SyS2 contain? 

3. Consider the 24 possible arrangements (permutations) of the symbols 
1234 and attach to each probability zz. Let A; be the event that the digit 
7 appears at its natural place (where i = 1, 2, 3, 4). Verify formula (7.4). 

4. A coin is tossed until for the first time the same result appears twice in 
Succession. To every possible outcome requiring n tosses attribute probability 
1/2". Describe the sample space. Find the probability of the following events: 
(a) the experiment ends before the sixth toss, (b) an even number of tosses 18 
required, 

5. In the sample space of example (5.b) let us attribute to each point of 
(+) containing exactly k letters probability 1/2*. (In other words, aa and bb 
carry probability 4, acb has probability 4, etc.) (a) Show that the probabilities 
of the points of (+) add up to unity, whence the two points (++) receive proba- 
bility zero. (b) Show that the probability that a wins is x4. The probability 
of b winning is the same, and c has probability 4 of winning. (c) The probabil- 
ity that no decision is reached at or before the kth turn (game) is 1/: ar, 
Saeed example (5.b) to take account of the pee ae of tae be 

ual games, i i ace. How 
FE A ai the appropriate sample sp: 

7. In Problem 3 show that 4142.43 C Ay and A1A24's C A's R 

8. Using the notations of example (4.d) show that (a) SıS2D3 = 0; @) 
SD C Es; (€) Fs — DiS: D SDi, 


9. Two dice are thrown Let A bi he sum of the faces is 
à e the event that the su: A 
odd, B the event of at least one ace. Describe the events AB, A U ee zy 


find their probabilities assuming that all 36 sample points have equa 
ilities. 
: ? C, 
10. In example (2.9), discuss th ning of the following events: (a) ABC, 
()A — AB, (c) ABYC. Cage 
11. In example (2.9), verif. ” 
č y that AC’ C B. h 
12. Bridge (cf. footnote 1). For k = 1, 2, 3, 4 let N be the event iy No 
has at least % aces. Let Sp, E,, W, be the analogous events for Sou en 
West. What can be said about the number z of aces in West’s eee i) 
the events (a) W,, (b) N2So, (c) N%yS':E's, (d) We — Ws, (6) NSP v 
NW., (g) (N2 U S:)E,? 
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13. In the preceding problem verify that (a) S3 C S2, (b) S3W2 = 0, (c) 
N2SiEiW, = 0, (d) NoS2 C Ws, (e) (N2 U S2)Ws = 0, (f) Wa = NSE. 
14. Verify the following relations.‘ 


(a) (A U BY = A'B’. (e) (A U B) — AB = AB’ U A'B. 
(b) (A U B) — B= A — AB = AB’, (f) A’ U B' = (ABY. 
(c) AA=AUA=A. (g9) (A U B)C = AC U BC. 


(dq) (A — AB)U B=AUB. 
15. Find simple expressions for ` 
(a) (A U B)(A U B’), (b) (A U B)(A’ U B)(A U B’), (c) (A U BX(B U 0). 
16. State which of the following relations are correct and which incorrect: 
(a) (A U B)— C = A U (B-00). 
(b) ABC = AB(C U B). 
(c) A U B U C = A U (B — AB) U (C — AQ). 
(d) A U B = (A — AB) U B. 
(e) AB U BC U CA D ABC. 


(f) (AB U BC U CA) C (AU BU). 
() (AU B)=A =B. 
(h) AB'C C A UB. 
(i) (A U B U 0Y = ABC". 
(A U BYC = A'C U BC. 


(3) 
(k) (A U BYC = A’B'C. 
() (A U BYC =C — C(A U B). 
17. Let A, B, C be three arbitrary events. Find expressions for the events 
that of A, B, C: 


(a) Only A occurs. (f) One and no more occurs. 
(b) Both A and B, but not C, occur. (g) Two and no more occur. 
(c) All three events occur. (h) None occurs. 

(d) At least one occurs. (i) Not more than two occur. 


(e) At least two occur. 

18. The union A U B of two events can be expressed as the union of two 
mutually exclusive events, thus: A U B = A U (B — AB). Express in a 
similar way the union of three events A, B, C. 

19. Using the result of problem 18 prove that 


P{A U B U C} = P{A} + P{B] + 
+ P{C} — P{AB} — P{AC} — P{BC} + P{ABC}. 
[This formula is a special case of IV(1.5).] 


4 Notice that (A U B)’ denotes the complement of A U B which is not the same 
as A’ U B’. Similarly, (AB) is not the same as A'B’. 


GHAPTER MI 


Elements 
of Combinatorial Analysis 


The purpose of this chapter is to derive a few basic formulas and to 
develop the corresponding probabilistic background. A more advanced 
reader may pass directly to chapter V where the main theoretical thread 
of chapter I is taken up again. 

In the study of simple games of chance, sampling procedures, occu- 
pancy and order problems, ete., we are usually dealing with finite sam- 
ple spaces in which the same probability is attributed to all points. 
To compute the probability of an event A we have then to divide the 
number of sample points in A (“favorable cases”) by the total number 
of sample points (“possible cases”). This is facilitated by a systematic 
use of a few rules which we shall now proceed to review. Simplicity 
and economy of thought can be achieved by adhering to a few standard 
tools, and we shall follow this procedure instead of describing the 
shortest computational method in each special case.! 


1. PRELIMINARIES fh. 

Pairs. With m elements a1, ..., dm and n elements bi, «++ bn, th ts 
possible to form mn pairs (a;, bx) containing one element from each group. 
form of à 


Proof. Arrange the pairs i lar array in the 
g pairs in a rectangula: y by) stands 


multiplication table with m rows and n columns so that (aj, ii 
at the intersection of the jth row and kth column, “Then each pair ap 
pears once and only once, and the assertion becomes OPVi0US- 


Examples. (a) Bridge cards (cf. footnote 1 to chapter I, sectio 
As sets of elements take the four suits and the thirteen face V8 


n 1). 
lues, 


; s 
? The interested reader will find many topics of elementary combinatorial h aa 
treated in the classical textbook, Choice and chance, by W. A- a aie: 
edition, London, 1901, reprinted by G. E. Stechert, New Yor! 
panion volume by the same author, DCC exercises, reprinted Ne 
contains 700 problems with complete solutions. 


26 


York, 1945, 


k, 1942, The coma 
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respectively. Each card is defined by its suit and its face value, and 
there exist 4-13 = 52 such combinations, or cards. 

(b) “Seven-way lamps.” Some floor lamps so advertised contain 3 
ordinary bulbs and also an indirect lighting fixture which can be oper- 
ated on three levels but need not be used at all. Each of these four 
possibilities can be combined with 0, 1, 2, or 3 bulbs. Hence there are 
4-4 = 16 possible combinations of which one, namely (0,0), means 
that no bulb is on. There remain fifteen (not seven) ways of operating 
the lamps. 


Multiplets. Given n; elements a, . . -, any and N2 elements by, .. +, Dny 
etc., up tu n, elements Tı, .--, np; tt TS possible to form ny-Ng °** Ny 
ordered r-tuplets (aj, bjs, - - -, Zj) containing one element of each kind. 


Proof. If r = 2, the assertion reduces to the first rule. If r = 3, 
take the pair (a;, bj) as element of a new kind. There are nnz such 
pairs and ng elements cz. Each triple (as, bj, cx) is itself a pair consisting 
of (a;, bj) and an element cg; the number of triplets is therefore ninong. 
Proceeding by induction, the assertion follows for every r. 


Perhaps the simplest and most useful way of describing the last 
theorem is as follows. To form an r-tuplet (ajs bj, ---, Zj) we 
have to choose-one a, one b, ete. We have to perform r selections in 


all and have in succession ni, M2, -+ +, nr possibilities to choose from. 
It is asserted that this procedure can lead to 71-n2 +++ nr different 
results. 


Examples. (c) Multiple classifications. Suppose that people are 
classified according to sex, marital status, and profession. The various 
categories play the role of elements. If there are 17 professions, then 
we have 2-2-17 = 68 classes in all. 

(d) In an agricultural experiment three different treatments are to 
be tested (for example, the application of a fertilizer, a spray, and tem- 
perature). If these treatments can be applied on r1, 72, and rg levels 
or concentrations, respectively, then there exist a total of rıT2r3 com- 
binations, or ways of treatment. 

(e) “Placing balls into cells” amounts to choosing one cell for each 
ball. With r balls we have r independent choices, and therefore r bails 
can be placed into n cells in n” different ways. It will be recalled from 
example 1(2.b) that a great variety of conceptual experiments are ab- 
stractly equivalent to that of placing balls into cells. For example, 
considering the faces of a die as “cells,” the last proposition implies 
that the experiment of throwing a die r times has 6” possible outcomes, 
of which 5” satisfy the condition that no ace turns up. Assuming that 
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all outcomes are equally probable, the event “no ace in r throws” has 
therefore probability (4). We might expect naively that in six throws 
“an ace should turn up,” but the probability of this event is only 
1 — (@)° or less than 2. [Cf. example (3.b).] 


2. ORDERED SAMPLES 


Consider the set or “population” of n elements aj, a2, .--, an. Any 
ordered arrangement a;,, a;,, ..., a;, of r symbols is called an ordered 
sample of size r drawn from our population. For an intuitive picture 
we can imagine that the elements are selected one by one. Two pro- 
cedures are then possible. First, sampling with replacement; here each 
selection is made from the entire population, so that the same element 
can be drawn more than once. The samples are then arrangements in 
which repetitions are permitted. Second, sampling without replacement; 
here an element once chosen is removed from the population, so that 
the sample becomes an arrangement without repetitions. Obviously, 
in this case, the sample size r cannot exceed the population size n. 

In sampling with replacement each of the r elements can be chosen 
in n ways: the number of possible samples is therefore n”, as can be 
seen from the last theorem with ni =n, =...= 7. In sampling with- 
out replacement we have n possible choices for the first element, but 
only n — 1 for the second, n — 2 for the third, etc. Using the same 
rule, we see that in this case we have n(n — 1) --- (n — r + 1) choices 
in all. Products of this type appear so often that it is convenient to 
introduce the notation 2 


(2.1) (n), = n(n — 1) --- (n —r + 1). 
Clearly (n), = 0 for integers r > n. We have thus the following 


Theorem. For a population of n elemenis and a prescribed sample 


size T, there exist n” different samples with replacement and (n)r samples 
without replacement, 


We note the special case where r = n. In sampling without replace- 
ment a sample of size n includes the whole population and represents 
a reordering (or permutation) of its elements. Accordingly, n elements 
ai, ..., An can ve ordered in (n)n =n-(n — 1). 2-1 different ways. 
Instead of (n),, we write n!, which is the more usual notation. We see 
that our theorem has the following 


Corollary. The number of different orderings of n elements îs 


3 i ; ok, 
* The notation (n), is not standard, but it will be used consistently in this bo! 
even if n is not an integer. 


Ae 


~ ae | aa 


II.2] ORDERED SAMPLES 29 
n! =n- (n —1)--- 2-1. 


Mr. and Mrs. Smith form a sample of size two drawn from the 
human population; at the same time, they form a sample of size one 
drawn from the population of all couples. This example shows that 
the sample size is defined only in relation to a given population. Toss- 
ing a coin r times is one way of obtaining a sample of size r drawn from 
the population of the two letters, H and T. The same arrangement of r 
letters H and T is a single sample point in the space corresponding to 
the experiment of tossing a coin r times. 

Drawing r elements from a population of size n is an experiment 
whose possible outcomes are samples of size r. Their number is n” or 
(n),, depending on whether or not replacement is used. In either case, 
our conceptual experiment is described by a sample space in which 
each individual point represents a sample of size r. 

So far we have not spoken of probabilities associated with our sam- 
ples. Usually we shall assign equal probabilities to all of them and then 
speak of random samples. The word “random” is not well defined, 
but when applied to samples or selections it has a unique meaning. 
Whenever we speak of random samples of fixed size r, the adjective ran- 
dom is to imply that all possible samples have the same probability, namely, 
n™ in sampling with replacement and 1/(n), in sampling without re- 
placement, n denoting the size of the population from which the sample 
is drawn. If n is large and r relatively small, the ratio (n),/n" is near 
unity. This leads us to expect that, for large populations and relatively 
small samples, the two ways of sampling are practically equivalent (cf. 
problems 11.1, 11.2, and VI, 35]. 

We have introduced a practical terminology but have made no state- 
ments about the applicability of our model of random sampling to 
reality. Tossing coins, throwing dice, and similar activities may be 
interpreted as experiments in practical random sampling with replace- 
ments, and our probabilities are numerically close to frequencies ob- 
served in long-run experiments, even though perfectly balanced coins 
or dice do not exist. Random sampling without replacement is typified 
by successive drawings of cards from a shuffled deck (provided shuffling 
is done much better than is usual). In sampling human populations 
the statistician encounters considerable and often unpredictable diffi- 
culties, and bitter experience has shown that it is difficult to obtain 


even a crude image of randomness. 


Exercise. In sampling without replacement the probability for any fixed ele- 
ment of the population to be included in a random sample of size r is 


Ees 1), + (n) = 1 — (n — r)/n = r/n. 


In sampling with replacement the corresponding probability is 1 — {(n — 1)/n]’. 
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3. EXAMPLES 


We consider random samples of size r with replacement taken from a 
population of the n elements ai, ..., an. We are interested in the 
event A that in such a sample (a;,, ..., aj,) no element appears twice, 
that is, that our sample could have been obtained also by sampling 
without replacement. The last theorem shows that there exist n” dif- 
ferent samples in all, of which (n), satisfy the stipulated condition. 
Assuming that all arrangements have equal probability, we conclude 
that the probability of no repetition in our sample is 


poe a) BD ; 


(3.1) = 
n nt 


The following concrete interpretations of this formula will reveal sur- 
prising features. 

(a) Random sampling numbers. Let the population consist of the 
ten digits 0, 1, ..., 9. Every succession of five digits represents a 
sample of size r = 5, and we assume that each such arrangement has 
probability 10-5. By (3.1), the probability that five consecutive random 
digits are all different is p = (10);10-° = 0.3024. 

We expect intuitively that in large mathematical tables having many 
decimal places the last five digits will have many properties of ran- 
domness. (In ordinary logarithmic and many other tables the tabu- 
lar difference is nearly constant, and the last digit therefore varies 
regularly.) As an experiment, sixteen-place tables * were selected and 
the entries were counted whose last five digits are all different. In the 
first twelve batches of a hundred entries each, the number of entries 
with five different digits varied as follows: 30, 27, 30, 34, 26, 32, 37, 
36, 26, 31, 36, 32. Small-sample theory shows that the magnitude of 
the fluctuations is well within the expected limits. The average fre- 
quency is 0.3142, which is rather close to the theoretical probability, 
0.3024 [cf. example VII(3,f)]. | à 

Consider next the number e = 2.71828. ... The first 800 decimals 
form 160 groups of five digits each, which we arrange in sixteen batches 
of ten each. In these sixteen batches the numbers of groups in whieh 
all five digits are different are as follows: 


Sa D ad aa a aa h By, ee 


: all- 
The frequencies again oscillate around the value 0.3024, and sma 


I. 
è Tables of probability functions, vol. I, National Bureau of Standards, 194 


4 Intermédiaire des recherches mathématiques, vol. 2, 1946, p- 112. 
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sample theory confirms that the magnitude of the fluctuations is not 
larger than should be expected. The overall frequency of our event 
in the 160 groups is 7% = 0.325, which is reasonably close to 
p = 0.3024. 

(b) If n balls are randomly placed into n cells, the probability that each 
cell will be occupied is n!/n”. This probability is surprisingly small: for 
n = 7 it is only 0.00612.... This means that if in a city seven accidents 
occur each week, then (assuming that all possible distributions are equally 
likely) practically all weeks will contain days with two or more accidents, 
and on the average only one week out of 165 will show a uniform distribu- 
lion of one accident per day. This example shows an unexpected char- 
acteristic of pure randomness. (All possible configurations of seven 
balls in seven cells are exhibited in table 1, section 5. With probability 
about 0.87 it will be observed that two or more cells remain empty.) 
For n = 6 the probability n!n—” equals 0.01548.... This shows how 
extremely improbable it is that in six throws with a perfect die all 
faces turn up. [The probability that a particular face does not turn 
up is about 4; cf. example (1.e).] 

(c) Elevator. An elevator starts with r = 7 passengers and stops at 


`n = 10 floors. What is the probability p that no two passengers leave 


at the same floor? To render the question precise, we assume that all 
arrangements of discharging the passengers have the same probability 
(which is a crude approximation). Then 


p = 1077 (10) = (10-9-8-7-6-5-4)10~7 = 0.06048. 


When the event was once observed, the occurrence was deemed re- 
markable and odds of 1000 to 1 were: offered against a repetition. 
(Cf. the answer to problem 10.43.) 

(d) Birthdays. The birthdays of r people form a sample of size r 
from the population of all days in the year. The years are not of equal 
length, and we know that the birth rates are not quite constant through- 
out the year. However, in a first approximation, we may take a random 
selection of people as equivalent to a random selection of birthdays and 
consider the year as consisting of 365 days. 

With these conventions we can interpret equation (3.1) to the effect 
that the probabiliiy, p, that all r birthdays are different is 5 


_ (865), _ h ERC -=) (1 -—). 
(3.2) p= 365" 365 365 365 


5 Cf. R. von Mises, Ueber Aufteilungs- und Besetzungs-Wahrscheinlichkeiten, 
Revue de la Facullé des Sciences de V Université d'Istanbul, N.S. vol. 4 (1938-1939), 


pp. 145-163. 
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Again the numerical consequences are astounding. Thus for r = 23 
people we have p < 4, that is, for 23 people the probability that at least 
two people have a common birthday exceeds $. , 

Formula (3.2) looks forbidding, but it is easy to derive good numeri- 
cal approximations to p. If r is small, we can neglect all cross products 
and have in a crude approximation ê 


lt Grbck E-D _, Wed 
= 365 z 730 


(3.3) p=ı 


For r = 10 the correct value is p = 0.883. . . ; equation (3.3) gives the 
approximation 0.877. ; 

For larger r we obtain a much better approximation by passing to 
logarithms. For small positive z we have log (1 — x) ~ —z, and thus 
from (3.2) ( 1) 

== r(r — 
a e RELE OD i 


365 730 


For r = 30 this leads to the approximation 0.3037 whereas the correct 
value is p = 0.294, For r < 40 the error in (3.4) is less than 0.08. 
(For a continuation see section 7. °See also answer to problem 10.44.) 


4. SUBPOPULATIONS AND PARTITIONS 


As before, we use the term population of size n to denote an aggregate 
of n elements without regard to their order. Two populations are con- 
sidered different if one contains an element not contained in the other; 
Choosing r elements out of a given population of size n means forming 
a subpopulation of size r. In how many ways can this be done? Each 
subpopulation of size r can be arranged in 7! different orders and in this 
way produces r! different samples without repetition. Conversely, 
each such sample of size r contains r different elements and thus deine 
a subpopulation of size r. We know that there exist (n); samples 0 
the described sort. If x is the number of subpopulations aes i 
obviously the number of ordered samples is x-r!, and we conclude tha 
z = (n),/r!. Numbers of this kind are known as binomial coefficients, 
and the standard notation for them is 


(4.1) PeT, 
r. ao 1-2+-:(r = en 


° The sign ~ signifies that the equality is only approximate. 
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We have now proved 
n 


Theorem 1. A population of n elemenis possesses ( 


populations of size r < n. 


) different sub- 
r. 


In other words, out of n elements, we can choose a group of r ele- 
n 

ments in ( Í different ways. Now choosing the r elements to be taken 
r 


out of the given population amounts to the same as choosing the 
n — r elements which are to stay in. It is therefore clear that for each 
r < n we must have i 


pa eee, 


To prove equation (4.2) directly we observe that an alternative way 
of writing the binomial coefficient (4.1) is 


(43) K z Te 


[This follows on multiplying numerator and denominator of (4.1) by 
(n — r)!.] -Note that the left side in equation (4.2) is not defined for 
r = 0, but the right side is. In order to make equation (4.2) valid for 
all integers r such that 0 < r < n, we now define 


(4.4) l @) =1, Oli; 
and (n) = 1. 


Examples. (a) Bridge and poker (cf. footnote 1 of chapter I). 
Since the order of the cards in a hand is irrelevant, the last theorem 


52 
shows that there exist (ae 635,013,559,600 different hands at 


bridge, and ee = 2,598,960 hands at poker. Let us calculate the 
probability, x, that a hand at poker contains five different face values. 
These face values can be chosen in (o ways, and corresponding to 
each card we are free to choose one of the four suits. It follows that 


52 é 
g= 45. (2) mn (5 ), which is approximately 0.5071. For bridge the 


52 
probability of thirteen different face values is 413 + cal or, approxi- 
mately, 0.0001057. 
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(b) Each of the 48 states has two senators. We consider the events 
that in a committee of 48 senators chosen at random: (1) a given state 
is represented, (2) all states are represented. 

In the first case it is better to calculate the probability q of the com- 
plementary event, namely, that the given state is not represented. 
There are 96 senators, and 94 not from the given state. Hence, 


48 - 47 
= A + w ARE c TET oe 
48) ~ \48/ 96-95 


Next, the theorem of section 2 shows that a committee including 
one senator from each state can be chosen in 2** different ways. The 
probability that all states are included in the committee is, therefore, 


96 i i 
p= 2 + GC.) Using Stirling’s formula (cf. section 9), it can be 


shown that p = (37)!27#8 = 4-107. ae 

(c) An occupancy problem. Consider once more a random distribu- 
tion of r balls in n cells (i.e., each of the n” possible arrangements has 
probability n="). To find the probability, px, that a specified cell con- 
tains exactly k balls (k = 0, 1, ..., 7) we note that the k balls can be 
chosen in C) ways, and the remaining r — k balls can be placed into 


the remaining n — 1 cells in (n — 1)’~* ways. It follows that 


aa m=()- 5- a-v- ()- 5 Q- $ 


on which will 


This is a special E t ‘ istribult 
pecial case of the so-called binomial dis din tables 


be taken up in chapter VI. Numerical values will be foun 
of chapter IV. i 
(d) Orderings involving two kinds of elements. Consider a pone 
of n = a + b elements, of which a are of one kind and b of another 
kind. For convenience we denote the elements by &1, %2, +- `? T E 
B2, ..., Bo. These elements can be ordered in n! different ways. -. m 
ever, if we agree to treat both the alphas and the betas as nar 
guishable among themselves (that is, if we omit their subscr} 
certain orderings become indistinguishable. In fact, an OF 
completely described by specifying the a places occupie 


a+b ays. 
and these a places can be chosen in (° i d = ( different EY 
a 


b : 
: aun- 
Accordingly, a population of a indistinguishable 


alphas and b indist 
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b b 
guishable betas can be arranged in [ ds ) = fee ) distinguishable 
a 


orders, (For example, the sequence aaaff can be ordered in ten dis- 
tinguishable ways.) Any permutation among the alphas, or among 
the betas, will leave the outer appearance unchanged so that a!b! per- 
mutations have the same outer appearance. It follows that if we attrib- 
ute to each of the (a + b)! permutations the same probability 1 + (a + b)!, 
then all distinguishable arrangements are equally probable, each having 
probability alb! + (a + b)!. Thus, if we speak about equally probable 
arrangements, the term applies both to distinguishable arrangements 
and to the aggregate of all permutations of the elements. (This stands 
in marked contrast to the case of random placements of balls into cells— 
see section 5.) 


Theorem 2. Let 7, ..., rx be integers such that 
(4.6) Ty +o +... + rk =n, 7, > 0. 


The number of ways in which a population of n elements can be divided 
into k ordered parts (partitioned into k subpopulations) of which the first 
contains rı elements, the second ra elements, etc. is 

n! 
(4.7) — 
Tila! +++ rel 
(The numbers (4.7) are called multinomial coefficients.) 

[Note that the order of the subpopulations is essential in the sense 
that (rı = 2, re = 3) and (rı = 3, 72 = 2) represent different parti- 
tions; however, no attention is paid to the order within the groups. 
Note also that 0! = 1 so that the vanishing r; in no way affect for- 
mula (4.7).] 


Proof. A repeated use of (4.3) will show that the number (4.7) may 
be rewritten in the form 


eiclew arma 


On the other hand, in order to effect the desired partition, we have 
first to select r, elements out of the given n; of the remaining n — r; 
elements we select a second group of size ro, ete. After forming the 
(/—1)st group there remain n — ri — 2 —... — kı = rk elements, 
and these form the last group. We conclude that (4.8) indeed repre- 
sents the number of ways in which the operation can be performed. 
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Examples. (e) Bridge. At a bridge table the 52 cards are parti- 
tioned into four equal groups and therefore the number of different 
situations is 52!-(13!)—* = (5.36. ..)-1078. Let us now calculate the 
probability that each player has an ace. The four aces can be ordered 
in 4! = 24 ways, and each order represents one possibility of giving 
one ace to each player. The remaining 48 cards can be distributed in 
(481) (12) ways. Hence the required probability is 


24-48!-(13)* + 52! = 0.105.... 


(f) Dice. A throw of twelve dice can result in 6’? different cut- 
comes, to all of which we attribute equal probabilities. The event that 
each face appears twice can occur in a8 many ways as twelve dice can 
be arranged in six groups of two each. Hence the probability of the 
event is 12!/(2°-6'*) = 0.003438... 


(In theorem 2 it is permitted that r; = 0 so that in reality the n 
elements are divided into k or fewer subpopulations. The case r; > 0 
of partitions into exactly k classes is treated in problem 11.7.) 


*5. APPLICATION TO OCCUPANCY PROBLEMS 


The examples of chapter I, section 2, indicate the wide applicability 
of the model of placing randomly r balls into n cells. We now turr to 
a discussion of this model, assuming, of course, that each of the n” 
possible distributions has probability n”. The most important prop- 
erties of a particular distribution are expressed by its occupancy num- 
bers T1, ..., Tn where r; is the number of balls in the ith cell. Here 


(5.1) rye +T +.. HT =7, 7; > 0. 


We agree to treat the balls as indistinguishavle. The distribution of balls 
is then completely described by its occupancy numbers, and two dis- 
tributions are distinguishable only if the corresponding ordered n-tuples 
(ri, +--+, Tn) are not identical. Our first aim is to prove the 


Lemma. The number of distinguishable distributions [i.e. the number 
of different solutions of equation (5.1)] is? 


9 tee CE 


* The material of this section is useful and illuminating but will not be used 
explicitly in the sequel. 
7 The special caser = 100, n = 4 has been used in example I (2.¢). 


aa 
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The number of distinguishable distributions in which no cell remains 


hf oh: 
empty is by a 


Proof. We use the artifice of representing the n cells by the space 
between n + 1 bars and the balls by stars. Thus |*+*|+| | | |#eee] is 
used as a symbol for a distribution of r = 8 balls in n = 6 cells with 
occupancy numbers 3, 1, 0, 0, 0, 4. Such a symbol necessarily starts 
and ends with a bar, but the remaining n — 1 bars and r stars can 
appear in an arbitrary order. In this way it becomes apparent that 
the number of distinguishable distributions equals the number of ways 
of selecting r places out of n -+r — 1. The condition that no cell be 
empty imposes the restriction that no two bars be adjacent. The r 
stars leave r — 1 spaces of which n — 1 are to be occupied by bars: 

res 


ik 
thus we have ( choices and the lemma is proved. 


ies 
5 
Examples, (a) There are (< ) distinguishable results of a 


throw with r indistinguishable dice. 

(b) Partial derivatives. The partial derivatives of order r of an ana- 
lytic function f(x, ...,2,) of n variables do not depend on the order 
of differentiation but only on the number of times that each variable 
appears. Thus each variable corresponds to a cell, and hence there 


a, intti oe... š 
exist ( T ) different partial derivatives of rih order. A function 
r 


of three variables has fifteen derivatives of fourth order and 21 deriva- 
tives of fifth order. 


Placing r balls into n cells is one way of partitioning the population 
of r balls. By theorem 2 of section 4 there exist r! + (r,!-ro! +++ Ta !) 
distributions with given occupancy numbers 7, ..., Tn. This formula 
still involves the order in which the occupancy numbers, or cells, appear, 
but frequently this order is immaterial. The following example is in- 
tended to illustrate an exceedingly simple and routine method of solv- 
ing many elementary combinatorial problems. 


Example. (c) Configurations of r = 7 balls in n = 7 célls. (The 
cells may be interpreted as days of the weck, the balls as calls, letters, 
accidents, etc.) For the sake of definiteness let us consider the dis- 
tributions with occupancy numbers 2, 2, 1, 1, 1, 0, 0 appearing in an 
arbitrary order. These seven occupancy numbers induce a partition of 
the seven cells into three subpopulations (categories) consisting, respec- 
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tively, of the two doubly occupied, the three simply occupied, and the 
two empty cells. Such a partition into three groups of size 2, 3, and 2 
can be effected in 7! + (2!-3!-2!) ways. To each particular assign- 
ment of our occupancy numbers to the seven cells there correspond 
7! + (2121-1!1-1!1-11-0!-0!1) = 7! + (2!-2!) different distributions of 
the r = 7 balls into the seven cells. Accordingly, the total number of 
distributions such that the occupancy numbers coincide with 2, 2,1, 1, 1, 
0, 0 in some order is 
7! 7! 


(5.3) 213121 212! 


Tt will be noticed that this result has been derived by a double applica- 
tion of (4.7), namely to balls and to cells. The same result can be de- 
rived and rewritten in many ways, but the present method provides 
the simplest routine technique for a great variety of problems. (Cf. 
problems 43-45 of section 10.) Table 1 contains the analogue to (5.3) 
and the probabilities for all possible configurations of occupancy num- 
bers in the case r = n = 7. 


TABLE 1 


RANDOM DISTRIBUTIONS OF 7 BALLS IN 7 CELLS 


Number of Probability (Number 
Occupancy Arrangements Equals of Arrangements 
Numbers 7! X 7! Divided by Divided by 77) 
AW ols Dekel, d mx 1! 0.006 120 
Rs e Ly: LA 5! xX 2! .128 518 
7 Aaria AN AR E a 0 21312! X 2!2! .321 295 
2:2, 2, 1, 0.0; 0 313! X 21212! .107 098 
3, 1.1, 1.15.0; 0 4!2! X 3! .107 098 
3, 2, 1, 1, 0, 0, 0 2!3! X 3!2! .214 197 
3, 2, 2, 0, 0, 0, 0 214! x 3!2!2! .026 775 
3, 3, 1, 0, 0, 0, 0 2!4! X 3!3! .017 850 
4,1, 1,1,0,0,0 313! x 4! .035 699 
4, 2, 1, 0, 0, 0, 0 4! X 412! .026 775 
4, 3, 0, 0, 0, 0, 0 5! X 413! .001 785 
5, 1, 1, 0, 0, 0, 0 214! X 5! .005 355 
5, 2, 0, 0, 0, 0, 0 5! X 512! .001 071 
6, 1, 0, 0, 0, 0, 0 51X6! .000 357 
7, 0, 0, 0, 0, 0, 0 6! x 7! .000 008 


Note on Bose-Einstein and Fermi-Dirac statistics. Up to now we have 
assumed that each of the n” possible distributions has probability n”. It is of 
interest that facts and experience have compelled physicists to abandon this hy- 
pothesis and to assign probabilities in different ways. 


ag> 
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Consider a mechanical system of r indistinguishable particles. ln statistical 
mechanics it is usual to subdivide the phase space into a large number, n, of small 
regions or cells so that each particle is assigned to one cell. In this way the state 
of the entire system is described in terms of a random distribution of the r particles 
in n cells. Offhand it would seem that (at least with an appropriate definition of 
the n cells) all n” arrangements should have equal probabilities. If this is true, the 
physicist speaks of Mazwell-Boltzmann statistics (the term “statistics” is here used 
in a sense peculiar to physics). Numerous attempts have been made to prove that 
physical particles behave in accordance with Maxwell-Boltzmann statistics, but 
modern theory has shown beyond doubt that this statistics does not apply to any 
known particles; in no case are all n” arrangements approximately equally probable. 
Two different probability models have been introduced, and each describes satis- 
factorily the behavior of one type of particle. The justification of either model 
depends on its success. Neither claims universality, and it is possible that some 
day a third model may be introduced for certain kinds of particles. 

Remember that we are here concerned only with indistinguishable particles. We 
have r particles and n cells. By Bose-Einstein statistics we mean that only distin- 
guishable arrangements are considered and that each is assigned probability 


= ]\ i 
6.4) ( ot ) . 
; 


It is shown in statistical mechanics that this assumption holds true for photons, 
nuclei, and atoms containing an even number of elementary particles.? To describe 
other particles a third possible assignment of probabilities must be introduced. 
Fermi-Dirac statistics is based on these hypotheses: (1) it is impossible for two or 
more particles to be in the same cell, and (2) all distinguishable arrangements satisfying 
the first condition have equal probabilities. The first hypothesis requires that r < n. 
An arrangement is then completely described by’ stating which of the n cells con- 
tain a particle; and since there are r particles, the corresponding cells can be chosen 


Z n ` 
in (") ways. Hence, with Fermi-Dirac statistics there are in all G possible ar- 
r, 


-1 
n , ¥ 
rangements, each having probability (6) . This model applies to electrons, neu- 


trons, and protons. We have hege an instructive example of the impossibility of 
selecting or justifying probability models by a priori arguments. In fact, no pure 
reasoning could tell that photons and protons would not obey the same probability 
laws. (Essential differences between Maxwell-Boltzmann and Bose-Einstein statis- 
tics are discussed in section 11, problems 14-19.) 

To sum up: the probability that cells number 1, 2, ..., n contain r1, T2, ..., Tn balls, 
respectively (where rı +... +1n = 7) equals 

Fr! T 


(5.5) rlro! «++ Tal 


under Maxwell-Boltzmann statistics; it is given by (5.4) under Bose-Einstein statistics; 


8 Cf. H. Margenau and G. M. Murphy, The mathematics of physics and chemistry, 
New York, 1943, Chapter 12. 


40 COMBINATORIAL ANALYSIS (11.5 


1 
and it equals C) under Fermi-Dirac statistics provided each r; equals 0 or 1. 
d 


Note that “Maxwell-Boltzmann statistics” is the physicist’s term for what we call 
random placement of balls into cells. 


Examples. (a) Letn = 5,r =3. The arrangement (*|—|+*|*|—) has probability 
xs, as, or qy, according to whether Maxwell-Boltzmann, Bose-Einstein, or Fermi- 
Dirac statistics is used. See also example I(6.b). 

(b) Misprints. A book contains n symbols (letters), of which r are misprinted. 
The distribution of misprints corresponds to a distribution of r balls in n cells with 
no cell containing more than one ball. It is therefore reasonable to suppose that, 
approximately, the misprints obey the Fermi-Dirac statistics. (Cf. problem 10.38.) 


5a. Application to Runs. In any ordered sequence of elements of two kinds, 
each maximal subsequence of elements of like kind is called a run. For example, 
the sequence aaafSaaffSfa opens with an alpha run of length 3; it is followed by runs 
of length 1, 2, 3, 1, respectively. The alpha and beta runs alternate so that the 
total number of runs is always one plus the number of unlike neighbors in the given 
sequence. 

Examples of applications. The theory of runs is applied in statistics in many 
ways, but its principal uses are connected with tests of randomness or tests of 
homogeneity. 

(a) In testing randomness, the problem is to decide whether a given observation 
is attributable to chance or whether a search for assignable causes is indicated. 
As a simple example suppose that an observation ° yielded the following arrange- 
ment of empty and occupied seats along a lunch counter: FOHEOEEEOEEEOEOE. 
Note that no two occupied seats are adjacent. Can this be due to chance? With 
five occupied and eleven empty seats it is impossible to get more than eleven runs, 
and this number was actually observed. It will be shown later that if all arrange- 
ments were equally probable the probability of eleven runs would be 0.0578. ... 
This small probability to some extent confirms the hunch that the separations ob- 
served were intentional. This suspicion cannot be proved by statistical methods, 
but further evidence could be collected from continued observation. If the lunch 
counter were frequented by families, there would be a tendency for occupants to 
cluster together, and this would lead.to relatively small numbers of runs. Similarly, 
counting runs of boys and girls in a classroom might disclose the mixing to be better 
or worse than random. Improbable arrangements give clues to assignable causes; 
an excess of runs points to intentional mizing, a paucity of runs to inientional cluster- 
ing. It is true that these conclusions are never foolproof, but efficient statistical 
techniques have been developed which in actual practice minimize the risk of in- 
correct conclusions, 

The theory of runs is also useful in industrial quality control as introduced by 
Shewhart. As washers are‘ produced, they will vary in thickness. Long runs of 
thick washers may suggest imperfections in the production process and lead to 
the removal of the causes; thus oncoming trouble may be forestalled and greater 
homogeneity of product achieved, 

In biological field experiments successions of healthy and diseased plants are 


°F, S. Swed and C. Eisenhart, Tables for testing randomness of grouping in a 
sequence of alternatives, Annals of Mathematical Stalistics, vol. 14 (1943), pP- 66-87. 


Ts 
‘none of which is empty. By the last lemma this can be done in es = 
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counted, and long runs are suggestive of contagion. The meteorologist watches suc- 
cessions of dry and wet months !° to discover clues to a tendency of the weather to 
persist. 

(b) To understand a typical problem of homogeneity, suppose that two drugs have 
been applied to two sets of patients, or that we are interested in comparing the 
efficiency of two treatments (medical, agricultural, or industrial). In practice, we 
shall have two sets of observations, say, œn a2, ---, %o and fı, 2, .--, Bb correspond- 
ing to the two treatments or representing a certain characteristic (such as weight) 
of the elements of two populations. The alphas and betas are numbers which 
we imagine ordered in increasing order of magnitude: a < a2 <... Saa and 
Bi < Bo <... < By. We now pool the two sets into one sequence ordered according 
to magnitude. An extreme case is that all alphas precede all betas, and this may 
be taken as indicative of a significant difference between the two treatments or 
populations. On the other hand, if the two treatments are identical, the alphas and 
betas should appear more or less in random order. Wald and Wolfowitz ™ have 
shown that the theory of runs can be often advantageously applied to discover 
small systematic differences. (An illustrative example, treated by a different 
method, will be found in chapter ITI, section 1.) 


Many problems concerning runs can be solved in an exceedingly simple manner. 
Given a indistinguishable alphas and b indistinguishable betas, we know from ex- 


ample (4.d) that there are is t D distinguishable orderings. If there are nı alpha 
a 


necessarily one of the numbers nı + 1 or 71. 
is equivalent to arranging them into n, cells, 


1) ait 
1 ra 


runs, the number of beta runs is 
Arranging the a alphas in nı runs 


ami b— 1 
guishable ways. It follows, for example, that there are G S i) ( m ) a 


rangements with nı alpha runs and nı + 1 beta runs (continued in problems 20-25 


of section 11). 

(c) In physics, the theory of r 
Tn Ising’s theory of one-dimensi 
unlike neighbors, that is, the number of runs. 


„uns is used in the study of cooperative phenomena. 
onal lattices the energy depends on the number of 


6. THE HYPERGEOMETRIC DISTRIBUTION 

an be reduced to the following form. 
are red and no = n — n; are black. 
A group of r elements is chosen at random. We seek the probability 


qr that the group so chosen will contain exactly k red elements. Here 


k can be any integer between zero and 7, or 7, whichever is smaller. 


Many combinatorial problems ¢ 
In a population of n elements nı 


10 W, G. Cochran, An extension of Gold’s method of examining the apparent 
persistence of one type of weather, Quarterly Journal of the Royal Meteorological 


Society, vol. 64, No. 277 (1938), pp. 631-634. 
n A. Wald and J. Wolfowitz, On a test whether two samples are from the same 


population, Annals of Mathematical Statistics, vol. 2 (1949), pp. 147-162. 
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To find gz, we note that the chosen group contains k red and r — k 

n 
black elements. The red ones can be chosen in ( i different ways 


—n 
and the black ones in (" n4 ways. Since any choice of k red ele- 
mE 


ments may be combined with any choice of black ones, we find 


Ge) 
k r—k 
(6.1) l: = —— 
n 
(’) 
The system of probabilities so defined is called the hypergeometric dis- 
tribution.” Using formula (4.3), it is possible to rewrite (6.1) in the 


y ge’ 


(6.2) N E A 
(") 


Note. The probabilities gx are defined only for k not exceeding r or ` 
nı. However, from the definition (4.1) it follows that () = 0 when- 


ever b >a. Therefore, formulas (6.1) and (6.2) give q = 0 if either 
k>mork>r, Accordingly, the definitions (6:1) and (6.2) may be 


used for all k > 0, provided the relation g; = 0 is interpreted as im- 
possibility. 


Examples. (a) Quality inspection. In industrial quality control, 
lots of size n are subjected to sampling inspection. The defective 
items in the lot play the role of “red” elements. Their number n; is, 
of course, unknown. A sample of size r is taken, and the number k 
of defective items in it is determined. Formula (6.1) then permits us 
to draw inferences about the likely magnitude of n, ; this is a typical 
problem of statistical estimation and is beyond the scope of the 
present book. 

(b) In example (4.b), the population consists of n = 96 senators of 
whom 7; = 2 represent the given state (are “red”). A group of 


1? The name is explained by the fact that the generating function (cf. chapter XI) 
of |q} can be expressed in terms of hypergeometric functions. 
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r = 48 senators is chosen at random. It may include k = 0, 1, or 2 
senators from the given state. From (6.2) we find, remembering (4.4), 


: 48 
= go = —— = 0.24737... = — = 0.50527.... 
qo = 42 "05 , qı 35 50527 


The value qo was obtained in a different way in example (4.b). 

(c) Estimation of the size of an animal population from recapture data.” 
Suppose that 1000 fish caught in a lake are marked by red spots and 
released. After a while a new catch of 1000 fish is made, and it is 
found that 100 among them have red spots. What conclusions can be 
drawn concerning the number of fish in the lake? This is a typical 
problem of statistical estimation. It would lead us too far to describe 
the various methods that a modern statistician might use, but we shall 
show how the hypergeometric distribution gives us a clue to the solu- 
tion of the problem. We assume naturally that the two catches may 
be considered as random samples from the population of all fish in the 
lake. (In practice this assumption excludes situations where the two 
catches are made at one locality and within a short time.) We also 
suppose that the number of fish in the lake does not change between 
the two catches. 

We generalize the problem by admitting arbitrary sample sizes. Let 


n = the (unknown) number of fish in the lake. 
n, = the number of fish in the first catch. They play the role of 
red balls. 
r = the number of fish in the second catch. 
k = the number of red fish in the second catch. 
qx(n) = the probability that the second catch contains exactly k red 


fish. 


In this formulation it is rather obvious that qx(m) is given by (6.1). 
In practice nı, T, and k can be observed, but is unknown. Notice, 
incidentally, that n is a fixed number which in no way depends on 
chance. It is, therefore, meaningless to ask for the probability that n 
is greater than, say, 6000. We know that nı +r — k different fish 
were caught, and therefore n > nı +r — k. This is all that can be 
ee 
used in the first edition without knowledge that the method 
ce. Newer contributions to the literature include N. T. J. 
ing the'size of mobile populations from recapture data, Bio- 
3-306, and D. G. Chapman, Some properties of the 
th applications to zoological sample censuses, Uni- 
Statistics, vol. 1 (1951), pp. 131-160. 


13 This example was 
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said with certainty. In our example we had nı =r = 1000 and 

= 100, and it is conceivable that the lake contains only 1900 fish. 
However, starting from this hypothesis, we are led to the conclusion 
that an event of a fantastically small probability has occurred. In fact, 
assuming that there are n = 1900 fish in all, the probability that two 
samples of size 1000 each will between them exhaust the entire popula- 


tion is by (6.1), 
HA (a eed (10001)? 
100/ \900/ \1000/ 10011900! 


Stirling’s formula (cf. section 9) shows this probability to be of the 
order of magnitude 10~**°, and in this situation common sense bids us 
to reject our hypothesis as unreasonable. A similar reasoning would 
induce us to reject the hypothesis that n is very large, say, a million. 
This consideration leads us to seek the particular value of n for which 
q(n) attains its largest value, since for that n our observation would 
have the greatest probability. For any particular set of observations 
nı, 7, k, the value of n for which qi(n) is largest is denoted by fi and is 
called the maximum likelihood estimate of n. This notion was intro- 
duced by R. A. Fisher. To find ñ consider the ratio 


amn) _ (a—m)n—n) 


6. Fa ne T 
63 an= 1) (n—m —r +k) 


A simple calculation shows that this ratio is greater than or smaller 
than unity, according as nk < mr or nk > mr. This means that with 
increasing n the sequence qx(”) first increases and then decreases; it 
reaches its maximum when n is the largest integer short of mr/k, so 
that ĉ equals about mr/k. In our particular example the maximum 
likelihood estimate of the number of fish is % = 10,000. 

The true number n may be larger or smaller, and we may ask for 
limits within which we may reasonably expect n to lie. For this pur- 
pose let us test the hypothesis that n is smaller than 8500. We sub- 
stitute in (6.1) n = 8500, ny = r = 1000, and calculate the probability 
that the second sample contains 100 or fewer red fish. This probability 
ist =g+q+...4+ Moo. A direct evaluation is cumbersome, but 
using the normal approximation of chapter VII, we find easily that 
x = 0.04. Similarly, if n = 12,000, the probability that the second 
sample contains 100 or more red fish is about 0.03. These figures 
would justify a bet that the true number n of fish lies somewhere be- 
tween 8500 and 12,000. There exist other ways of formulating these 
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conclusions and other methods of estimation, but we do not propose to 
discuss the details. 


From the definition of the probabilities g+ it follows that go + g1 + 
+a+...=1. Formula (6.2) therefore implies that for any positive 
integers n, nı, and r 


a OCOC 


This identity is frequently useful. We have proved it only for positive 
integers n and r, but it holds true without this restriction for arbitrary 
positive or negative numbers n and r (it is meaningless if nı is not a 
positive integer). (An indication of two proofs is given in section 12, 
problems 8 and 9.) 

The hypergeometric distribution can easily be generalized to the case 
where the original population of size n contains several classes of ele- 
ments. For example, let the population contain three classes of sizes 
Ni, ng, and n — Ny — Na, respectively. If a sample of size r is taken, 
the probability that it contains kı elements of the first, ka elements of 
the second, and r — kı — kz elements of the last class is, by analogy 
with (6.1), 


MN na [n — n — Ne n 
s5 bey a tare Se p 
oa kı/ \ke/ \r — kı — ke r, 
It is, of course, necessary that kı < nı, ke X no, and r — kı — ko < 
<n — n — n 
Example. (d) Bridge. The population of 52 cards consists of four 


classes, each of thirteen elements. The probability that a hand of thir- 
n cards consists of five spades, four hearts, three diamonds, and one 


teel ž 
_ (18 (4 () E i e 
cubis (5) AINEI NIS NBI 
7. EXAMPLES FOR WAITING TIMES 


In this section we shall depart from the straight path of combinatorial 
analysis in order to consider some sample spaces of a novel type to 
which we are led by a simple variation of our occupancy problems. 
Consider once more the conceptual “experiment” of placing balls ran- 
domly into n cells. This time, however, we do not fix in advance the 
number r of balls but let the balls be placed one by one as long as nec- 
essary for a prescribed situation to arise. Two such possible situations 
will be discussed explicitly: (i) The random placing of balls continues 
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until for the first time a ball is placed into a cell already occupied. The 
process terminates when the first duplication of this type occurs. 
(ii) We fiz a cell (say cell number 1) and continue the procedure of placing 
balls as long as this cell remains empty. The process terminates when a 
ball is placed into the prescribed cell. 

A few interpretations of this model will elucidate the problem. 


Examples. (a) Birthdays. In the birthday example (3.d), the 
n = 365 days of the year correspond to cells, and people to balls. 
Our model (i) now amounts to this: If we select people at random one 
by one, how many people shall we have to sample in order to find a 
pair with a common birthday? Model (ii) corresponds to waiting for 
my birthday to turn up in the sample. 

(b) Key problem. A man wants to open his door. He has n keys, 
of which only one fits the door. For reasons which can only be sur- 
mised, he tries the keys at random so that at each try each key has 
probability n— of being tried and all possible outcomes involving the 
same number of trials are equally likely. What is the probability that 
the man will succeed exactly at the rth trial? This is a special case of 
model (ii). It is interesting to compare this random search for the key 
with a more systematic approach (problem (10.11); see also problem 
V, 5). 

(c) In the preceding example we can replace the sampling of keys 
by a sampling from an arbitrary population, say by the collecting of 
coupons. Again we ask when the first duplication is to be expected 
and when a prescribed element will show up for the first time. 

(d) Coins and dice. In example I(5.a) a coin.is tossed as often as 
necessary to turn up one head. This is a special case of model (ii) 
with n = 2. When a die is thrown until an ace turns up for the first 
time, the same question applies with n = 6. (Other waiting times are 
treated in problems 21, 22, and 36 of section 10, and 12 of section 11.) 


We begin with the conceptually simpler model (i). It is convenient 
to use symbols of the form ( Ji; J2, --+,Jr) to indicate that the first, 
second, ..., rth ball are placed in cells number j}, jo, .. ., jr and that 
the process terminates at the rth step. This means that the j; are in- 
tegers between 1 and n; furthermore, jı, ..., j,1 are all different, but 
Jr equals one among them. Every arrangement of this type represents 
a sample point. For r only the values 2,3, ..., +1 are possible, since 
a doubly occupied cell cannot appear before the second ball or after the 
(n+1)st ball is placed. The connection of our present problem with 
the old model of placing a fixed number of balls into the n cells leads 
us to attribute to each sample point (jı, ...,j-) involving exactly r 
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balls the probability n”. We proceed to show that this convention is 
permissible (i.e., that our probabilities add to unity) and that it leads 
to reasonable results. 

For a fixed r the aggregate of all sample points (jı, - - -, jr) represents 
the event that the process terminates at the rth step. According to (8.1) 
the numbers j;, .--, jr—1 can be chosen in (n),;1 different ways; for 
j+ we have the choice of the r — 1 numbers ji, ---, jr-1- It follows 
that the probability of the process terminating at the rth step is 


_@ar@-) (, N -A Z, 
a _—— =(1 -) ( a a 


with qı = 0 and qz = 1/n. The probability that the process lasts for 
more than r steps is p, = 1 — (qı + G2 +---+4r) or Pi = 1 and 


on oe) 


as can be seen by simple induction. In particular, pay: = 0 and 
qı +...+ dm4i = 1, as is proper. Furthermore, when n = 365, for- 
mula (7.2) reduces to (3.2), and in general our new model leads to the 
same quantitative results as the previous model involving a fixed num- 
ber of balls. 


The model (ii) differs from (i) in that it depends on an infinite sample 
space. The sequences (ji, ..., Jr) are now subjected to the condition 
that the numbers jı, ..., jr—1 are different from a prescribed number 
a <n, but j} =a. Moreover, there is no a priori reason why the 
process should ever terminate. For a fixed r we attribute again to each 
sample point of the form (jı, ...,J-) probability nm Eorgji a adei 
we have n — 1 choices each, and for j, no choice at all. For the prob- 
ability that the process terminates at the rth step we get therefore 


nm-—1\"" 1 

(7.3) q* = = E EA Bes 
n n 

Summing this geometric series we find q1* + g2* +... = 1. Thus the 


probabilities add to unity, and there is no necessity of introducing a 
sample point to represent the possibility that no ball will ever be placed 
into the prescribed cell number a. For the probability 


pet =1—(qa*+---+ a*) 
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that the process lasts for more than r steps we get 


ir 
(7.4) pt = (1-20. r=1,2,... 
as was to be expected. 


The medians for the distributions {p+} and {p,*} are defined as those 
values of r for which p, and Pr* come closest to 3; it is about as likely 
that the process continues beyond the median as that it stops before. 
(In the birthday example (3.d) the median isr = 23.) To calculate the 
median for {p,} we pass to logarithms as we did in (3.4). When r is 
small as compared to n, we see that —log p, is close to r?/2n. It fol- 
lows that the median to {p,} is close to (n-2-log 2)! or, approximately 
ni. It is interesting that the median increases with the square root 
of the population size. By contrast, the median for {p,*} is close to 
n-log 2 or 0.7n and increases linearly with n. The probability of the 
waiting time in model (ii) to exceed n is (1 — n—)? or, approximately, 
e = 0.36788... 


8. BINOMIAL COEFFICIENTS 


n 
We have used binomial coefficients ( ) only when n is a positive 
T, 


integer, but it is very convenient to extend their definition. The num- 
ber (x), introduced in equation (2.1), namely 


(8.1) (2), = s(x — 1) +++ (£ =r + 1) 


is well defined for all real x provided only that r is a positive integer. 
For r = 0 we put (z)y = 1. Then 


tT)  (@)r _ a(@ — 1) ++» @ —r 41) 
ae o ead ore 


defines the binomial coefficients Jor all values of x and all positive integers r. 
For r = 0 we put, as in (4.4), (5) = land0! = 1. For negative integers 
r we define 

(8.3) H si (r < 0). 


r, 


x , 
We shall never use the symbol ( ) if r is rot an integer. 
r. 


"E 


1 
(8.10) log i 
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It is easily verified that with this definition we have, for example, 


(8.4) B =(-1) GC) = (-1)'(r+ 1). 


Three important properties will be used in the sequel. First, for any 
positive integer n 


(8.5) (") =0 ifeithe s>a o <0, 
T: 


Second, for any number x and any integer r 


6o sy) i) Sle 


These relations are easily verified from the definition. The proof of 
the next relation can be found in calculus textbooks: for any number a 
and all values —1 < t < 1, we have Newton’s binomial formula 


(8.7) atomit (Ne (eres... 


If a is a positive integer, all terms to the right containing powers higher 
than ¢* vanish automatically and the formula is correct for all ¿ Ifa 
is not a positive integer, the right side represents ‘an infinite series. 

Using equation (8.4), we see that for a = —1 the expansion (8.7) 
reduces to the geometric series 


1 Ai” 
(8.8) sa, VSEE Be te eo eet 
1+¢ 


Integrating (8.8), we obtain another formula which will be useful in 
the sequel, namely, the Taylor expansion of the natural logarithm 


(8.9) log (1 +4) =¢— 49? + 48 —3e4 0 Hp epey. 


Two alternative forms for (8.9) are frequently used. Replacing t by 
—t we get 


SETTER EI reper 


ie 


Adding the last two formulas we find 


1+¢ 
(8.11) 2 log 5 Sp E binge | a EN ILES. 
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For 0 < ż¿ < 1, the right-hand member of (8.10) exceeds ¢ but is smaller 
than t+2+@+...=t/(1—#). Hence we have the double in- 
equality ; 

(8.12) o Heil eae O<t<l. 


Many useful relations and identities will be derived from (8.7) in sec- 
tion 12. Here we mention only that for any positive integer n we find, 
letting ¢ = 1, 


(8.13) (") + 3 + C) ee: (") =2 


Incidentally, this formula admits of a simple combinatorial interpre- 
tation: The left side represents the number of ways in which a popula- 
tion of n elements can be divided into two subpopulations if the size 
of the first group is permitted to be any number k=0,1,... n. On 
the other hand, such a division can be effected directly by deciding for 
each element whether it is to belong to the first or second group. (A 
similar argument shows that the multinomial coefficients (4.7) add up 
to k”.) 

9. STIRLING’S FORMULA 

An important tool of analytical probability theory is contained in a 

classical theorem ™ known as 


Stirling’s Formula: 
(9.1) n! ~ (2n)in™ He 


where the sign ~ is used to indicate that the ratio of the two sides tends to 
unity asn > œ. 

This formula is invaluable for many theoretical purposes and can be 
used also to obtain excellent numerical approximations. It is true that 
the difference of the two sides in (9.1) increases over all bounds, but it 
is the percentage error which really matters. It decreases steadily, and 
Stirling’s approximation is remarkably accurate even for small n. In 
fact, the right side of (9.1) approximates 1! by 0.9221 and 2! by 1.919 
and 5! = 120 by 118.019. The percentage errors are 8 and 4 and 2, 
respectively. For 10! = 3,628,800 the approximation is 3,598,600 with 
an error of 0.8 per cent. For 100! the error is only 0.08 per cent. 

Proof of Stirling’s formula. We consider 


(9.2) an = log 2+ log3 +...+ log (n — 1) + 2 log” 


“u James Stirling, Methodus differentialis, 1730. 
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Figures 1. Illustrating the derivation of Stirling’s formula and, more generally, 
the approximation of sums by integrals. 


which differs from log n! only by the factor 4 in the last term to the 
right. We shall show that a, represents the areas of two different 
polygons, and this remark will lead to two bounds for logn!. Figure 
1 illustrates the situation for the special case n = 4. On writing 


(9.8) an = d{log 1 + log 2} + ${log 2 + log 3} +...+ 
+ } {log (n — 1) + log n} 


it becomes apparent that a, equals the area of the trapezoid whose 
vertices are the points A;, Ag, ..-, An of the curve y = log x with 
abscissas 1, 2, ..., n and the point (n, 0) of the z-axis. This trape- 
zoid being inside the curve, its area is smaller than the area of the 
domain bounded by the curve, the z-axis, and the line x = n. 

On the other hand, log k equals the area of the trapezoid with basis 
k — 4 <2 <k-+ 3 and bounded above by the tangent to the curve 
at the point A; = (k, log k). It follows that log (n — 1)! is greater than 
the area of the domain bounded by y = log z, the z-axis, and the ver- 
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tical lines z = $ and z = n — $. Now 4 logn quite obviously ex- 
ceeds the area of the strip n — 3 < x < n under the curve, and hence 
an exceeds the area under the curve and between z = $ and z =n. In 


other words, we have shown that 
(9.4) fez: ae <a, <f log z - dz. 
4 1 


The indefinite integral of log z is given by £ log x — z, and equation 
(9.4) reduces to the double inequality 


(9.5) (n+ $)logn —n + $(1 — log $) < 


< logn! < (n + 4) logn — n + 1. 
Put for abbreviation 


(9.6) ôn = logn! — (n+ 4) logn +n. 


Then 1 — 6, is the difference between the extreme right member of 
(9.5) and log 7!, that is, 1 — 6, equals the area of the domain between 
the curve y = log x and the polygon A,Ay... An. It follows that ôn 
decreases monotonically. But by (9.5) we have 3(1 — log 3) < ôn < 1. 
We conclude that 6, tends to a limit comprised between 1 and 
3(1 — log 3). Denoting this limit by log c we have 


(9.7) ôn — loge where 2.45 < c < 2.72. 


In logarithmic notation Stirling’s formula reduces to (9.7) with c = (2r)? 
(or 2.507, approximately). Now ~r can be defined in many ways, and 
for our purposes it is simplest and most natural to define t = c?/2. 
With this definition we have Stirling’s formula, but it remains to show 
that the constant so defined agrees with the more familiar r of other 
formulas. This fact will develop as a by-product of other calculations 
A chapter VII, and so the proof of Stirling’s formula will be completed 
there. 


Refinements. Stirling’s formula can be improved by the addition of further 
terms. Although we shall never make use of such refinements, we shall here indi- 
cate the proof of the following double inequality 16 


(9.8) (2n)in" tentin < n! < (2n)inn Hen +/ UR), 
To prove (9.8) note that 


p = 1), n4+1 1 LENI es 
(9.9) ön — dn41 (r +3) LOR -l= 3@n +1? a 5@n + 1s 


18H. Robbins, A remark on Stirling’s formula, American Mathematical Monthly, 
vol. 62 (1955), pp. 26-29. 
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[the last expansion follows from (8.11) on setting ¢ = 1/(2n + 1)]. We increase the 
extreme right member in (9.9) by replacing the coefficients $ h d,- by 4; this 
leads to a geometric series with ratio (2n + 1)~, and thus 


1 1 1 
(9.10) bn — net < Ma += im a+r) 
Accordingly, ôn — 1/12n increases monotonically. Now the limit of this sequence 
is given by Stirling’s formula, and passing to antilogarithms we have the second 
inequality in (9.8). The first inequality follows similarly from (9.9) on noticing 
that 
1 1 1 


f ôn — ôn = . 
(@.1) Saya 3Q@n +1)? > ian F1 1m +1) +1 


The accuracy of the approximations (9.8) is remarkable; even for n = 1 the 
formula leads to the two bounds 0.9958... and 1.0023.... The upper bound pro- 
vided in (9.8) is slightly better [cf. (12.28)]. For n = 2 it yields 2.0007, forn = 5 
we get 120.01..., and for n = 10 the first five significant figures are correct. 


PROBLEMS FOR SOLUTION 


Note: Sections 11 and 12 contain problems of a different character and diverse 
complements to the text. 


10. EXERCISES .AND EXAMPLES 
Note: Assume in each case that all arrangements have the same probability. 


1. How many different sets of initials can be formed if every person has one 
surname and (a) exactly two given names, (b) at most two given names, (c) 
at most three given names? 

2. In how many ways can two rooks of different colors be put on a chess- 
board so that they can take each other? 

3. Letters in the Morse code are formed by a succession of dashes and dots 
with repetitions permitted. How many letters is it possible to form with ten 
symbols or less? 

4. Each domino piece is marked by two numbers. The pieces are symmetri- 
cal so that the number-pair is not ordered. How many different pieces can be 
made using the numbers 1, 2, ..., n? 

5. The numbers 1, 2, ..., n are arranged in random order. Find the proba- 
bility that the digits (a) 1 and 2, (b) 1, 2, and 3, appear as neighbors in the 
order named. 

6. (a) Find the probability that among three random digits there occur 2,1, 
or 0 repetitions. (b) Do the same for four random digits. 

7. Find the probabilities p, that in a sample of r random digits no two are 
equal. Estimate the numerical value of pio, using Stirling’s formula. 

8. What is the probability that among k random digits (a) 0 does not appear; 
(b) 1 does not appear; (c) neither 0 nor 1 appears; (d) at least one of the two 
digits 0 and 1 does not appear? Let A and B represent the events in (a) and 
(b) Express the other events in terms of A and B. 
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9. If n balls are placed at random into n cells, find the probability that 
exactly one cell remains empty. 

10. At a parking lot there are twelve places arranged in a row. A man ob- 
served that there were eight cars parked, and that the four empty places were 
adjacent to each other (formed one run). Given that there are four empty 
places, is this arrangement surprising (indicative of non-randomness)? 

11. A man is given n keys of which only one fits his door. He tries them 
successively (sampling without replacement). This procedure may require 1, 
2, ..., n trials. Show that each of these n outcomes has probability mt, 

12. Suppose that each of n sticks is broken into one long and one short part. 
The 2n parts are arranged into n pairs from which new sticks are formed. 
Find the probability (a) that the parts will be joined in the original order, (b) 
that all long parts are paired with short parts.1* 

13. Testing a statistical hypothesis. A Cornell professor got a ticket twelve 
times for illegal overnight parking. All twelve tickets were given either 
Tuesdays or Thursdays. Find the probability of this event. (Was his renting 
a garage only for Tuesdays and Thursdays justified?) 

14. Continuation. Of twelve police tickets none was given on Sunday. Is 
this evidence that no tickets are given on Sundays? 

15. A box contains ninety good and ten defective screws. If ten screws are 
used, what is the probability that none is defective? 

16. From the population of five symbols a, b, c, d, e, a sample of size 25 is 
taken. Find the probability that the sample will contain five symbols of each 
kind. Check the result in tables of random numbers,” identifying the digits 
0 and 1 with a, the digits 2 and 3 with b, etc. 

17. If men, among whom are A and B, stand in a row, what is the probabil- 
ity that there will be exactly r men between A and B? If they stand in a ring 
instead of in a row, show that the probability is independent of r and hence 
1/(n — 1). (In the circular arrangement consider only the arc leading from 
A to B in the positive direction.) 

18. What is the probability that two throws with three dice each will show 
the same configuration if (a) the dice are distinguishable, (b) they are not? 

19. Show that it is more probable to get at least one ace with four dice than 
at least one double ace in 24 throws of two dice. (The answer is known as 
de Méré’s paradox. Chevalier de Méré, a gambler, thought that the two 
probabilities ought to be equal and blamed mathematics for his losses.) 

20. From a population of n elements a sample of size r is taken. Find the 
probability that none of N prescribed elements will be included in the sample, 


16 When cells are exposed to harmful radiation, some chromosomes break and 
play the role of our “sticks.” The “long” side is the one containing the so-called 
centromere. If two “long” or two “short” parts unite, the cell dies. See D. G. 
Catcheside, The effect of X-ray dosage upon the frequency of induced structural 
changes in the chromosomes of Drosophila Melanogaster, Journal of Genetics, vol. 
36 (1938), pp. 307-320. 

17 They are occasionally extraordinarily obliging: see J. A. -Greenwood and E. B. 
Stuart, Review of Dr. Feller's critique, Journal for Parapsychology, vol. 4 (1940), 
pp. 298-319, in particular p. 306. 
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assuming the sampling to be (a) without, (b) with replacement. Compare 
the numerical values for the two methods when (i) n = 100, r = N = 3, and 
(ii) n = 100, r = N = 10. 

21. Spread of rumors. In a town of n+ 1 inhabitants, a person tells a 
rumor to a second person, who in turn repeats it to a third person, etc. At 
each step the recipient of the rumor is chosen at random from the n people 
available. Find the probability that the rumor will be told r times without: 
(a) returning to the originator, (b) being repeated to any person. Do the same 
problem when at each step the rumor is told to a gathering of N randomly 
chosen people. (The first question is the special case N = 1.) 

22. Chain letters. In a population of n + 1 people a man, the “progenitor,” 
sends out letters to two persons, the “first generation.” These repeat the per- 
formance and, generally, each member of the rth generation sends out letters 
to two persons chosen at random. Find the probability that the generations 
number 1, 2, ..., r will not include the progenitor. Find the median of the 
distribution, supposing 7 to be large. 

23. A familiar problem. In a certain family four girls take turns at washing 
dishes. Out of a total of four breakages, three were caused by the youngest 
girl, and she was thereafter called clumsy. Was she justified in attributing 
the frequency of her breakages to chance? Discuss the connection with ran- 
dom placements of balls. 

24, What is the probability that (a) the birthdays of twelve people will fall 
in twelve different calendar months (assume equal probabilities for the twelve 
months), (b) the birthdays of six people will fall in exactly two calendar months? 

25. Given thirty people, find the probability that among the twelve months 
there are six containing two birthdays and six containing three. 

26. A closet contains n pairs of shoes. If 2r shoes are chosen at random 
(with 2r < n), what is the probability that there will be (a) no complete pair, 
(b) exactly one complete pair, (c) exactly two complete pairs among them? 

27. A car is parked among N cars in a row, not at either end. On his return 
the owner finds that exactly r of the N places are still occupied. What is the 
probability that both neighboring places are empty? 

28. A group of 2N boys and 2N girls is divided into two equal groups. Find 
the probability p that each group will be equally divided into boys and girls. 
Estimate p, using Stirling’s formula. 

29. In bridge, prove that-the probability p of West’s receiving exactly k 
aces is the same as the probability that an arbitrary hand of thirteen cards 
contains exactly k aces. (This is intuitively clear. Note, however, that the 
two probabilities refer to two different experiments, since in the second case 
thirteen cards are chosen at random and in the first case all 52 are distributed.) 

30. The probability that in a bridge game East receives m and South n 
spades is the same as the probability that of two hands of thirteen cards each, 
drawn at random from a deck of bridge cards, the first contains m and the 
second n spades. 

31. What is the probability that the bridge hands of North and South to- 
gether contain exactly k aces, where k = 0, 1, 2, 3, 4? 

32. Leta, b, c, d be four non-negative integers such that a + b + c + d = 13. 
Find the probability p(a, b, c, d) that in a bridge gdme the players North, East, 
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South, West have a, b, c, d spades, respectively. Formulate a scheme of plac- 
ing red and black balls into cells that contains the problem as a special case. 
33. Using the result of problem 32, find the probability that some player 
receives a, another b, a third c, and the last d spades if (a) a = 5, b = 4,c = 3, 
d=1;()a=b=c=4,d=1;()a=-b=4,c=3,d=2. 

Note that the three cases are essentially different. 

34. Let a, b, c, d be integers witha + b + c + d = 13. Find the probabil- 
ity (a, b, c, d) that a hand at bridge will consist of a spades, b hearts, c dia- 
monds, and d clubs and show that the problem does not reduce to one of plac- 
ing, at random, thirteen balls into four cells. Why? 

35. Distribution of aces among r bridge cards. Calculate ‘the probabilities 
polr), pi(r), ..-, pa(r) that among r bridge cards drawn at random there are 

, 1, ..., 4 aces, respectively. Verify that po(r) = p4(52 — 7). 

36. Continuation: waiting times. If the cards are drawn one by one, find 
the probabilities f(r), ..., fa(r) that the first, ..., fourth ace turns up at the 
rth trial. Guess at the medians of the waiting times for the first, ..., fourth 
ace and then calculate them. 

37. Find the probability that each of two hands contains exactly k aces if 
the two hands are composed of r bridge cards each, and are drawn (a) from 
the same deck, (b) from two decks. Show that when r = 13 the probability 
in part (a) is the probability that two preassigned bridge players receive exactly 
k aces each. i 

38. Misprints. Each page of a book contains N symbols, possibly mis- 
prints. The book contains n = 500 pages and r = 50 misprints. Show that 
(a) the probability that pages number 1, 2, ..., n contain, respectively, 
Ti, T2, «++, Tn misprints equals 


G @) + CF): 


(b) for large N this probability may be approximated by (5.5). Conclude that 
the r misprints are distributed in the n pages approximately in accordance with a 
random distribution of r balls in n cells. (Note. This may be restated as & 
general limiting property of Fermi-Dirac statistics. Cf. section 5.) 


Note: The following problems refer to the material of section 5. 


39. If rı indistinguishable things of one kind and rz indistinguishable things 
of a second kind are placed into n cells, find the number of distinguishable 
arrangements. 

40. If rı dice and rz coins are thrown, how many results can be distinguished? 

41, In how many different distinguishable ways can rı white, 72 black, and rs 
red-balls be arranged? 

42. Find the probability that in a random arrangement of 52 bridge cards 
no two aces are adjacent. 

43. Elevator. In the example (3.c) the elevator starts with seven passen- 
gers and stops at ten floors, The various arrangements of discharge may be 
denoted by symbols like (3, 2,2), to be interpreted as the event that pie 
passengers leave together at a certain floor, two other passengers at another 
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floor, and the last two at still another floor. Find the probabilities of the fifteen 
possible arrangements ranging from (7) to. 1,1, 1,1, R ADE 

44. Birthdays. Find the probabilities for the various configurations of the 
birthdays of 22 people. 

45. Find the probability for a poker hand to be a (a) royal flush (ten, jack, 
queen, king, ace in a single suit); (b) four of a kind (four cards of equal face 
values); (c) full house (one pair and one triple of cards with equal face values); 
(d) straight (five cards in sequence regardless of suit); (e) three of a kind (three 
equal face values plus two extra cards); (f) two pairs (two pairs of equal face 
values plus one other card); (g) one pair (one pair of equal face values plus 
three different cards). 


11. PROBLEMS AND COMPLEMENTS OF A THEORETICAL 
CHARACTER 
1. A population of n elements includes np red ones and ng black ones 


(p+q=1). A random sample of size r is taken with replacement. Show 
that the probability of its including exactly k red elements is 


(11.1) (;) por". 


2. A limit theorem for the hypergeometric distribution. If n is large and 
n/n = p, then the probability qx given by (6.1) and (6.2) is close to (11.1). 
More precisely, 


ua (7) (p- xy @-— os a< (p) rar (i- 2)" 


A comparison of this and the preceding problem shows: For large populations 
there is practically no difference between sampling with or without replacement. 

3. A’random sample of size r without replacement is taken from a population 
of n elements. The probability u, that N given elements will all be included 
in the sample is 


ans » (I= CO) 


(The corresponding formula for sampling with replacement is given by (11.10) 
and cannot be derived by a direct argument. Foran alternative form of (11.3) 


ef. problem IV, 9.) 
4, Limiting form. If n —> » andr — ~ so that r/n — p, then u, > p” 


(ef. problem 13). 


Note: Problems 5-13 refer to the classical occupancy problem (Mazxwell-Boltzmann 
statistics): That is, r balls are distributed among n cells and each of the n” possible dis- 


tributions has probability n~." 


18 Problems 5-19 play a role in quantum statistics, the theory of photographic 
plates, G-M counters, etc. The formulas are therefore frequently discussed and 
discovered in the physical literature, usually without a realization of their classical 
and essentially elementary character. Probably all the problems occur (although 
in modified form) in the book by Whitworth quoted at the opening of this chapter. 
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5. The probability p+ that a given cell contains exactly k balls is given by 
the binomial distribution (4.5). The most probable number is the integer v 
such that (r — n + 1)/n <» < (r + 1)/n. (In other words, it is asserted that 
Po < Pı <... < Pr—1 Ê Dy > Dr41 >,..> Dr; cf. problem 15.) 

6. Limiting form. If n —> œ and r — œ so that the average number 
A = r/n of balls per cell remains constant, then 
(11.4) Pe —> en*/Kl. 


This is the Poisson distribution, discussed in chapter VI; see problem 16. 
7. Let A(r,n) be the number of distributions leaving none of the n cella 
empty. Show by a combinatorial argument that 


(11.5) AG, n+) =È C) A(r—k, n). 
k=l 

Conclude that 

(11.8) Ate, n) = È (= i Gip. 


Hint: Use induction; assume (11.6) to hold and express A(r—k, n) in 
(11.5) accordingly. Change the order of summation and use the binomial 
formula to express A(r,n-+1) as the difference of two simple sums. Replace 
in the second sum y + 1 by a new index of summation and use (8.6). 


Note: Formula (11.6) provides a theoretical solution to an old problem but obviously 
tt would be a thankless task to use it for the calculation of the probability z, say, that in 
2 village of r = 1900 people every day of the year is a birthday. In chanter IV, section 
2, we shall derive (11.6) by another method and obtain a simple approximation formula 
(showing, e.g., that z = 0.135, approximately). 


8. Show that the number of distributions leaving exactly m cells empty is 
11.7) E, = am) ail?) SHO Aamen 
(11.7) Ear n) = (1) Ar n—m) = (0) E ("7 ™) a-m- 


9. Show without using the preceding results that the probability 
Pmlr, n) = n~"E,,(7r, n) 
of finding exactly m cells empty satisfies 


n—m m—1 
at Pm—x(r, 2) ——" 


(11.8) Palr +1, n) = Prl, n) 


10. Using the results of problems 7 and 8, show by direct calculation that 
(11.8) holds. Show that this method provides a new derivation (by induction 
on r) of (11.6). 

11. From (11.6) and problem 8 conclude that the probability of finding m 
or more cells empty is 


(11.9) (*) Zor @ š A) (1 a niy a 


(For m > n'this expression reduces to zero, as is proper.) 
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12. The probability that each of N given cells is occupied is 


(11.10) ur, n) =n > () Alk, N)\(n — NY 
Conclude that 
aw wear Lem) 9 


(Use the binomial theorem. For N =n we have u(r, n) = 27" A(r, n). 
Note that (11.11) is the analogue of (11.3) for sampling with replacement.’ 
For an alternative derivation see problem IV, 8.) 

13. Limiting form. For the passage to the limit described in problem 4 one 
has u(r, n) —> (1 — e7)%. 


Note: In problems 14-19 r and n have the same meaning as above, but we assume 
that the balls are indistinguishable and that all distinguishable arrangements have 
equal probabilities (Bose-Einstein statistics). 


14. The probability that a given cell contains exactly k balls is 
_ (ntr—-k—2 _ (nr +r-—1 
(11.12) air ( r—k ) ` ( r ) 


15. Show that when z > 2 zero is the most probable number of balls in 
any specified cell, or more precisely, go > qı >--- (cf. problem 5). 

16. Limit theorem. Letn > ~ andr — ©, 80 that the average number of 
particles: per cell, r/n, tends to À. Ther 


X 

(11.13) qk > ato 
(The right side is known as the geometric distribution.) 

17. The probability that exactly m cells remain empty is 


(11.14) ray js | cP) 1 Oa F 


19 Note that u(r, n) may be interpreted as the probability that the waiting time 
up to the moment when the Nth element joins the sample is less than r. The 
result may be applied to random sampling digits: here u(r, 10) — u(r — 1, 10) is 
the probability that a sequence of r elements must be observed to include the 
complete set of all ten digits. This can be used as a test of randomness. R. E. 
Greenwood (Coupon collector’s test for random digits, Mathematical Tables and 
Other Aids to Computation, vol. 9 (1955), pp. 1-5) has tabulated the distribution and 
compared it to actual counts for the corresponding waiting times for the first 2035 
decimals of x and the first 2486 decimals of e. The median of the waiting time for 
a complete set of all ten digits is 27. The probability that this waiting time exceeds 
50 is greater than 0.05, and the probability of the waiting time exceeding 75 is 


about 0.0037. 
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18. The probability that a group of m prescribed cells contains a total of 
exactly 7 balls is 


—(™+j-1\ (n-—mt+r—j-1 2 (ETH), 
Gis) asm) = ( m—1 )( r-j )+( r ) 
19. Limiting form. For the passage to the limit of problem 4 we have 
m+j—1 P 
(11.16) gm) > aT Tape mer 
(The right side is a special case of the negative binomial distribution to be intro- 
duced in chapter VI.) 


Theorems on Runs. *In problems 20-25 we consider arrangements of rı alphas 
and rz betas and assume that all arrangements are equally probable [see example (4.d)]. 
This group of problems refers to section 5a. 


20. The probability that the arrangement contains exactly k runs of either 
kind is 


ane T poy C D Ps (" my 


when k = 2 is even, and 


ese) +ClDC; HC 
when k = 2p + 1 is odd. 
21. Continuation. Conclude that the most probable number of runs is an 


integer k such that awe <k< a 4-8. (Hints, Consider tha ‘ratios 
Pose + Py and Pagi + Pa.) 

22. The probability that the arrangement starts with an alpha run of length 
v > 0 is (r),72 + (r1 + 12),43. (Hint: Choose the v alphas and the beta which 
must follow it.) What does the theorem imply for v = 0? 


23. The probability of having exactly k runs of alphas is 


11.19 sA ø+ y (any, 
a oF Cai igs po tae ) 
Hint: This follows easily from the second part of the lemma of section 5. 


Alternatively, equation (11.19) may be derived from (11.17) and (11.18), but 
this procedure is more laborious, 


24, The probability that the nth alpha is preceded by exactly m betas is 
+t =n- m (m+n—1y . T+ 72) 
(1120) ( ) ( m ) j ( "1 ) 


Ta ~= m 


25. The probability for the alphas to be arranged in k runs of which kı 
are of length 1, ke of length 2, ..., k, of length v (with kı + .. . + ky = k) is 


(11.21) aie ‘or `) 4? A "p 


ee 
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12. PROBLEMS AND IDENTITIES INVOLVING BINOMIAL 
COEFFICIENTS 


1. For integral n > 2 


1 @)eQ)- tet 
(7) +2(5) +3 (3) tes 
(1) -2(0) +8 G) - t= 


21(() +2206) +49(() bon aon 


(Hint: Use the binomial formula.) 
2. Prove that for positive integers n, k 


a QO- OGD OGD =A Co) 


More generally *° 


aa EOGI- ae 


3. For any a > 0 


—8\ 2 aye (Ok k-1l). 
pos =e) 
If a is an integer, this can be proved also by differentiation of the geometric 
series Dz* = (1 — x)~!. 
4. Prove that 


(12.5) C) 272r = (—1)" Foi 
5. For integral non-negative n and r and all real a 
128) AE ao 


(Hint: Use equation (8.6). The special case n = a is frequently used.) 
6. For arbitrary a and integral n >0 


(12.7) yoy E) =D (77 N. 


[Hint: Use equation (8.6).] 


2 The reader is reminded of the convention (8.5): if » runs through all integers, 
only finitely many terms in the sum in (12,3) are different from zero. 
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7. For positive integers r, k 
(2s) BC CE) 


(a) Prove this ysing (8.6). (b) Show that (12.8) is a special case of (12.7). (c) 
Show by an inductive argument that (12.8) leads to a new proof of the first 
part of the lemma of section 5. , 

8. In section 6 we remarked that the terms of the hypergeometric distribu- 
tion should add to unity. This amounts to saying that for any positive integers 
a,b, n, - 


K ODODO- 


Prove this by induction. (Hint: Prove first that equation (12.9) holds for 
a = 1 and all b.) ; 
9. Continuation. By a comparison of the coefficients of t” on both sides of 


(12.10) (HAL HOD = (1 + tjet 


prove more generally that (12.9) is true for arbitrary numbers a, b (and in- 
tegral n). 


10. Using equation (12.9), prove that 
amw O C- 
11. Using equation (12.10), prove that 


S (2n)! any? 
(12.12) Dora m= (3) - 


12. Prove that for integers 0 < a < b 
Ar a [A fb +k x b N 
(12.13) pee) (:) G si D (, = 


Mind: Using (12.4) show that (12.11) is a special case of (12.9). Alternatively, 
compare coefficients of (=t in (1 — 1)9(1 — A)? = (f — ye 
13. By specialization derive from (12.9) the identities 


(12.14) GG 2s) +— ain Po! 


an 
alee 


valid if k, n, and r are Positive integers. (Hint: Use (12.4).] 
14. Using equation (12.9), prove that 


(12.16) eet oie (tit E 


j=0 k-j 
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(Hini: Apply equation (12.4) back and forth.) Note the important special 
cases b = 1, 2. 

15. Referring to the problems of section 11, notice that equations (11.12), 
(11.14), (11,15), and (11.16) define probabilities. In each the quantities should 
therefore add to unity. Show that this is implied, respectively, by (12.8), 
(12.9), (12.16), and the binomial theorem. 

16. From the definition of A(r, n) in problem 7 of section 11 it follows that 
A(r, n) = O ifr < n and A(n, n) = n!. In other words 


12.1 E eo T ifr<n 
gap pax » (p) n! ifr=n. 


(a) Prove (12.17) directly by reduction from n to n — 1. (b) Next prove 
(12.17) by considering the rth derivative of (1 — e‘)" at t = 0. (c) Generalize 
(12.17) by starting from (11.11) instead of (11.6). 

17. If 0 < N <7 prove by induction that for each integer r > 0 


(12.18) cr Ca- (7M) a 


(Note that the right-hand member vanishes when r < N and when r > n.) 
Verify (12.18) by considering the rth derivative of *—"(t — 1)” at t = 1, 
18. Prove by induction (using the binomial theorem) 


aw (Ji-Qitten (i-p 


nal 
Verify (12.19) by integrating the identity >> (1 — t} = {1 — (1 — t)"}¢-, 
0 
19. Show that for any positive integer m 


m! 


(12.20) (+y +2)" = Lapel xeyrze 
where the summation extends over all non-negative integers a, b, c, such that 


a+tb+c=m. 
20. Using Stirling’s formula, prdve that 


(12.21) (2) ~ (wn)-2, 


21. Prove that for any positive integers a and b 
(a+ 1)(@+2)---(a+n)_ b! 


~ — ne 
ay, (+ 10+2)--b+n) ~al™ 
22. The gamma function is defined by 
j T(z) = 7—le—2 dz 
(12.28) (z) je e 


where x > 0. Show that T(x) ~ (2r)te—7r7—}. (Notice that if r = n is an 
nteger, T(n) = (n — 1)!.) 
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23. Let a and r be arbitrary positive numbers and n a positive integer. 
Show that 


(12.24) ala + F)(a + 2r)---(a + nr) ~ Cre tina t(alr) Hg—n, 
[The constant C is equal to (2x)3/T(a/r).] 
24. Using the results of the preceding problem, show that 

aa SEMEL EUD ay 
25. Prove the following alternative form of Stirling’s formula: 

(12.26) n! ~ (2n)i(n + Arte +h, ' 
26. Continuation. Using the method of the text, show that 

(12.27) (2r)i(n + He- +h —1/24(n+}) <n! < (2r)}(n + 4) He- 
27. Extending Stirling’s formula, prove that 


r 1 1 
12.28) I~ inat pa e e o 
( ) n (2m) !n" +Y exp | n+ I2n 360r + } 


es aa 


» 


CHAPTER III* 


Fluctuations in Coin Tossing 


and, Random Walks 


This chapter serves two purposes. First, it will show that exceed- 
ingly simple methods may lead to far-reaching and important results. 
Second, in it we shall for the first time encounter theoretical conclusions 
which not only are unexpected but actually come as a shock to intuition 
and common sense. They will reveal that commonly accepted notions 
concerning chance fluctuations are without foundation and that the 
implications of the law of large numbers are widely misconstrued. 

The discussion is inserted at this place only because of its elementary 
character; the main topic of the book continues in chapter V. The 
entire book is independent of the present chapter. Some of the formulas 
will reappear later in connection with first passages and recurrence, but 
they will be derived anew by analytical methods. A- comparison of 
methods should prove instructive and interesting. Accordingly, the 
present chapter should be read at the reader’s discretion independently of, 
or parallel to, the remainder of the book. To facilitate such a procedure, 
this chapter may be read in two versions: the main text appears in 
ordinary type. Passages in small type cover additional topics (refer- 
ring mainly to first passage and recurrence phenomena) and should be 
omitted at first reading. Section 7 contains an empirical illustration. 


* This chapter may be omitted or read in conjunction with the following chapters. 
Reference to its contents will be made in chapters X (laws of large numbers), XI 
(first-passage times), XIII (recurrent events), XIV (random walks), but the con- 
tents will not be used explicitly in the sequel. g 

1 Although we are dealing formally only with coin tossing, the basic conclusions 
are widely applicable. In fact, E. Sparre Andersen has made the surprising dis- 
covery that many facets of the fluctuation theory of sums of independent random 
variables are of a purely combinatorial nature and are common to a huge class of 
such variables. This is true, in particular, of the two arc sine laws. See Mathe- 
matica Scandinavica, vol. 1 (1953), pP- 263-285, and vol. 2 (1954), pp. 195-223. 
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1. GENERAL ORIENTATION 


A surprising wealth of information concerning chance fluctuations in 
general will be derived from the following inconspicuous lemma an- 
nounced in 1887 by Bertrand. Similar problems of arrangements have 
attracted the interest of students of combinatorial analysis under the 
name of ballot problems.? Suppose that, in a ballot, candidate P scores p 
votes and candidate Q scores q votes, where p > q. The probability that 
throughout the counting there are always more votes for P than for Q equals 


(p —4)/(p +9). 


° 
In mathematical language we are here concerned with arrangements 
of x = p + q symbols «;, &, ..., €z consisting of p plus ones (votes for 
P) and q minus ones (votes for Q). The partial sum sk = « + € + 
+...-++ ex is the number of votes by which P leads, or trails, just after 
the kth vote is cast. Clearly s+ = p — q and 


(1.1) Si — S41 = & = £1, 8) = 0 (= 1,2,...,2) 


Conversely, every arrangement {81, 82, ..., Sz} of integers satisfying 
(1.1) represents a potential voting record. We shall use a geometrical 
terminology and represent such an arrangement by a polygonal line 
whose ith side has slope e; and whose ith vertex has ordinate s; Such 
lines will be called paths. 


Definition. Let > 0 and y be integers. A path {81, 82, .--, Sz} 
from the origin to the point (x, y) is a polygonal line whose vertices have 
abscissas 0, 1, 2, ..., » and ordinates So, 81, 82, ..+, Sz satisfying (1.1) 
with sz = y. 

If p among the e; are positive and q negative, then 


(1.2) z=p+q y=p-4 


An arbitrary point (z, y) can be joined to the origin by 4 path only if 
z and y are of the form (1.2). In this case the p places for the positive 


* For the history and literature see A. Dvoretzky and T. Motzkin, A problem of 
arrangements, Duke Mathematical Journal, vol. 14 (1947), pp. 305-313. As these 
authors point out, most of the formally different proofs in reality use the reflection 
principle (lemma 1 of section 2), but without the geometric interpretation this 
principle loses its simplicity and appears as a curious trick. Dvoretzky and Motzkin 
give a new proof of great simplicity and elegance. They generalize the ballot prob- 
lem by requiring that at each instant P have at least æ times the votes scored by 
Q. This work has been continued by M. T. L. Bizley, Derivation of a new formula 
for the number of minimal lattice paths, etc., The Journal of the Institute of Actu- 
aries, vol. 80, Part 1, No. 354 (1954), pp. 55-62. 
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g; can be chosen from the z = p + q available places in 


(13) Noe 4 - 1 č t p A 


different ways. It is convenient to define Nz, = 0 whenever T, y are 
not of the form (1.2). Then there exist exactly Nz,y different paths from 
the origin to the point (x, y). Bertrand’s ballot theorem asserts that 
when y > 0 there exist exactly (y/z)Nz,, paths satisfying the condi- 
tions sı > 0, s2 > 0, ..., Sz—1 > 0, Sz = y. It will be proved in sec- 
tion 2, ° 


Example. Figure 1 exhibits a path to the point N, = (5,1). There 
exist ten such paths of which two satisfy the conditions s; > 0. The 
path in the graph is {1, 2, 1, 2, 1}, and the other is {1, 2, 3, 2, 1}. 


Figure 1. Illustrating positive paths and the proof of theorem 2 in section 2. 


We can draw the most interesting conclusions from the ballot theo- 
rem if we drop the convention that the terminal point (x, y) of the path 
be fixed in advance. There exist 2” different paths from the origin to 
points (n, y) with an arbitrary ordinate y. As explained in section 3, 
these 2” paths may be taken to represent the 2” possible outcomes of 
the ideal experiment consisting in n successive tossings of a perfect 
coin. The classical description introduces the fictitious gambler 
Peter who at each trial wins or loses a unit amount. The sequence 
{81, S2, ..-, Sn} then represents Peter's successive cumulative gains, that 
is, the excess of the accumulated number of heads over tails. 

If sn = 0, the net gain at the conclusion of the nth trial is zero: there 
exists a tie. Ties occur so infrequently that they do not affect the pic- 
ture, but repeated references to them are disturbing. We shall there- 
fore agree to say that at the nth irial Peter leads if either sn > 0 or 
Sn = 0 but Sn—ı > O (i.e., in case of a tie that player leads who led at 
the preceding trial). “Peter leads at the nth trial” is but a description 
for “the nth side of the path is above the z-axis.” 

The ballot theorem refers to paths situated entirely above the z-axis, 
that is, to games in which the lead never changes. This topic may be 
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pursued further by investigating how often the lead is likely to change 
for an arbitrary path. In this connection we reach conclusions that 
play havoc with our intuition. It is generally expected that in a pro- 
longed series of coin tossings Peter should lead about half the time and 
Paul the other half. This is entirely wrong, however. In 20,000 toss- 
ings it is about 88 times more probable that Peter leads in all 20,000 trials 
than that each player leads in 10,000 trials. In general, the lead changes 
at such infrequent intervals that intuition is defied. No matter how 
long the series of tossings, the most probable number of changes of lead 
is zero; exactly one change of lead is more probable than two, two 
changes are more probable than three, etc. In short, if a modern 
educator or psychologist were to describe the long-run case histories 
of individual coin-tossing games, he would classify the majority of 
coins as maladjusted. If many coins are tossed n times each, a sur- 
prisingly large proportion of them will leave one player in the lead 
almost all the time; and in very few cases will the lead change sides 
and fluctuate in the manner that is generally expected of a well-behaved 
coin. 
This is a sample of the conclusions to be drawn from the first arc 
- gine law (see section 5 and the illustration in section 7). E. Sparre 
Andersen has shown that this law has a wide field of ‘applicability, and 
the situation here described for coin tossings is typical for chance fluc- 
tuations involving cumulative effects. Most stochastic processes in 
physics, economics, and education are of this nature, and our findings 
should serve as a warning to those who are prone to discern secular 
trends and deviations from average norms. 


The same situation may be viewed from a somewhat different angle. If the coin 
tossing proceeds at a uniform rate, common sense expects that, with due allowance 
for chance fluctuations, a two-day game should produce twice as many ties as a 
one-day game. In other words, we expect intuitively the number of ties to increase 
roughly in proportion to the duration of the game. Paradoxically this is not so: 
The number of ties increases about as the square root of time. In 10,000 tossings the 
median number of ties is 67, but in 1,000,000 tossings it increases only to 674; the 
typical “wavelength” increases from about 150 to about 1500. The average wave- 
length increases with time (sections 6 and 8). The formulas on which these conclu- 
sions are based play an important role for first passage and recurrence times in 
general random walks and diffusion theory. 

Theorem 3 of section 2 stands apart from the remainder and is not used elsewhere. 
It concerns a variant of the ballot problem for the case where the two candidates 
score the same number, n, of votes. Then P leads an even number, 2k, of times and 
Q leads in the remaining 2n — 2k trials. Again we have the false intuition that 
each candidate is likely to lead about half the time, that is, we expect 2k to be 
close to n. Actually, if the ballot ended in a tie n:n, the n + 1 possible divisions of 

leads (namely 2n:0, 2n—2:2, 2n—4:4, ..., 2:2n—2, 0:2n) have the same probability 
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(n +1)—. This result stands in a marked contrast to the situation described 
above where the end result was not prescribed in advance; there the extreme divi- 
sions 2n:0 and 0:2n are most probable. 

It has been pointed out by J. L. Hodges * that this theorem has statistical appli- 
cations to rank-order tests. We illustrate this point by the 


Example. Suppose that a quantity (e.g. the height of plants) is measured on 
each of n treated subjects and also on each of n control subjects, obtaining measure- 
ments a}, ..., Gn and by, ...-, bn. To fix ideas, suppose that each group is arranged. 
in decreasing order: a; > az >... and bı > bz >.... Let us combine the two 
sequences and write the 2n letters a, ..., bn in decreasing order. The resulting 
arrangement of n letters a and n letters b may be interpreted as the record of a 
ballot in which each candidate received n votes. For an extremely successful treat- 
ment all the a’s should precede the b’s; a completely ineffectual treatment should 
produce a random order. In our arrangement the a’s lead exactly 2k times if k 
different a’s precede the b’s of same rank, that is, if the inequality a; > b; holds for 
exactly k subscripts. Assuming randomness, the probability that this happens 
equals 1/(n + 1) and therefore the probability that the a’s lead 2k times or more 
is (n — k + 1)/(n +1). The classical example for this argument (used qualita- 
tively without knowledge of the theoretical probabilities) is due to Galton who used 
it in 1876 for data referred to him by Charles Darwin. In his example 2n was 30 
and the a’s were in the lead 26 times. Galton concluded that the treatment was 
efficient, but on the hypothesis of mere randomness even an ineffectual treatment 
would produce 26 or more leads in three out of sixteen similar experiments. This 
shows that a qualitative analysis may be a valuable supplement to our rather shaky 
intuition, (For related tests based on the thcory of runs see chapter II, section 5a.) 


2. PROBLEMS OF ARRANGEMENTS 


Let A = (a,a) and B = (b, 8) be integral points in the positive 
quadrant: b > a > 0, a > 0,8 > 0. By reflection of A-on the x-axis is 


B 


Fıcure 2. Illustrating the reflection principle. 


meant the point A’ = (a, —a). (See figure 2.) A path from A to B is 
defined as in section 1, with A playing the role of the origin. 


3 Galton’s rank-order test, Biometrika, vol. 42 (1955), pp. 261-262. 
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Lemma.‘ (Reflection principle.) The number of paths from A to B 
which touch or cross the x-axis equals the number of all paths from A’ to B. 


Proof. Consider a path {s = a, 8e41, --+, S = B} from A to B 
having one or more vertices on the x-axis. Let ¢ be the abscissa of the 
first such vertex (see figure 2); that is, choose ¢ so that sa > 0, ..., 
Stı > 0, sı =0. Then {—8a, —8e41, ..., —St—1, Se = 0, 8:41, Seqo, 
- - - Sb} is a path leading from A’ to B and having T = (t, 0) as its first 
vertex on the z-axis. The sections AT and A'T being reflections of 
each other, there exists a one-to-one correspondence between all paths 
from A’ to B and such paths from A to B as have a vertex on the 
z-axis. The lemma is proved. 


Theorem 1. (Ballot theorem.) Let x > 0, y > 0; the number of paths 
{s1, S2, ..., Sz = y} from the origin to (x,y) such that sı > 0, s2 > 0, 
+++) 8 > 0 equals (y/z)Nz,y. 

Proof. Since sı = +1, we have sı = 1 for each admissible path. 
It follows that there exist as many admissible paths as there are paths 
leading from the point (1, 1) to (z, y) which neither touch nor cross 
the z-axis. By the last lemma the number of such paths equals 


—] -1 
(2.1) Neat 7 Nz—iw4i a 4 i 2 ) = 4 g 4 ) 
g= q-1 


p =q fp Fg y 
Fay) Et 
PEN p z 


The Duality Principle. Almost every theorem on paths can be reformulated 
to obtain a formally different theorem. Consider {s1, ..., 8n} and the path ob- 


tained from it by reversing the order of the «;, that is, the path {s1*, 82%, ---, 82"! 
where 4" =e, s" = es + e241, 83* = es + e241 + €2-2, «++ oF 
(2.2) s* =0, s* =s,- özl; so 8% Be — Beni, ay U mee 


The two paths (1.1) and (2.2) are congruent and are obtained from each other by 
a rotation through 180 degrees; they join the same endpoints. To each theorem o: 
paths there corresponds a dual theorem obtained by applying it to the reversed path (2.2). 


For example, the ballot theorem gives us the number of reversed paths {s1",. -> 
+++, 82*} joining the origin to (z, y) such that s;* > Ofori = 1,2,...,z- But thisis 


4 The probability literature attributes this method to D. André (1887). The text 
reduces it to a lemma on random walks. The classical difference equations of ran- 
dom walks (chapter XIV) closely resemble differential equations, and the reflection 
principle (even a stronger form of it) is familiar in that theory under the name of 
Lord Kelvin's method of images. 


pa 
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the same as 8; > Sz—i for i = 1, 2, ..., 2—1 and hence we have as an alternative 
form of the ballot theorem 


Theorem 1*. The number of paths {81, $2, ---, 8z} from (0, 0) to (z, y) such that 
81 < 8z, 82 < Sz, ++, 8-1 < 8z (where sz = y > 0) equals (y/2)N z,- 

Geometrically speaking, theorem 1 is concerned with paths whose left endpoint 
is the lowest vertex, whereas the dual theorem 1* refers to paths whose last vertex 
is highest. (See figure 3.) Theorem 1* has implications for first-passage times in 
random walks. 


5 


Ficure 3. Illustrating first passages and returns to the origin. 
We turn to a study of paths joining the origin to a point N = (2n, 0) 


of the z-axis (an odd vertex on the z-axis is impossible). Put for 
abbreviation 


as P 


2r E £ 
Theorem 2. Among the ( ) paths joining the origin to the point 2n 
n 


of the x-axis there are 
(a) exactly Lon—2 paths such that 

(2.4) 3 >0, s2>0, «++, San1>0, (sen ='0) 
(b) exactly Lon paths such that 

(2.5) * 420, 20, m Snail 0, (son = 0). 


(That is, there are as many paths to 2n with all inner vertices above the 

x-axis äs there are paths to 2n — 2 with no vertex below the x-axis.) 
Proof. (See figure 1) 

through the point N= 


Each path satisfying condition (2.4) passes 
(2n—1, 1) and by theorem 1 the number of 
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paths to N: such that sı > 0, ..., sono > 0 equals 


2.6) 1 2) (H-A = 

a2 2a —1\n—1/ an n= AE 

This proves (a). Again, let a path satisfy condition (2.4). Omitting 
the first and the last side we get a path that joins the point 0, = (1, 1) 
to Ni = (2n—1, 1) and at the same time is such that all its vertices 
lie on or above the line y = 1. Translating the origin to 01, we get a 
path from the new origin to the point N, (which has the new coordi- 
nates 2n—2 and 0), none of whose vertices lies below the new z-axis. 


We have thus established a one-to-one correspondence between such 
paths and all paths satisfying (2.4), and the theorem is proved. 


As explained in section 1 the following theorem stands apart from the remainder 
and will not be used in the sequel. 


Theorem 3.5 Let Inzon be the number of paths from the origin to the point 2n of 
the z-axis such that 2k of its sides lie above the x-axis and 2n — 2k below (k = 0,1, ..., 
see) n). Then Lokon = Lan, independenily of k. 


Proof. The assertion is trivially true for n = 1 and we assume by induction that 
La,» = Ly, for» = 1,2, ..., n—l and 0 <k <». We propose to count the num- 
ber of paths {s1, sə, ..., S2n = 0} with exactly 2k sides above the z-axis. First 
assume 1 < k <n — 1. Such a path crosses the z-axis and we denote by 2r the 
abscissa of its first vertex on the z-axis, We have then to consider two classes of 
paths. 

A path of the first class is positive between 0 and 2r, and its section between 2r 
and 2n contains exactly 2k — 2r sides above the axis. Here k > r. By theorem 
2(a) there exist L2,_2 paths fsi, <, S2r—1, 827 = 0} with 8, > 0, ..., Sar—1 > 0, and 
by the induction hypothesis there exist Lok—2r,2n—2r = Lenor paths joining (2r, 0) 
to (2n, 0) and having 2k — 2r sides above the z-axis. Accordingly, there exists a 
total of L2,2L2n—2, paths of this class. 

A path of the second class is negative between 0 and 2r; its section between 2r 
and 2n then contains 2k sides above the z-axis. By the argument above there 
exist again L2-_2Ln_2, paths of this class, but this time n — r > k. 

It follows that for k = 1, .. „n=l 


n—k 


k 
(2.7) Lok,2n = 2 Lor—oLan—or + > Lar—2Lan—2r- 
t= r=1 


By changing the summation index to p =n —r + 1, the terms of the second 


* First proved by complicated analytical methods by K. L. Chung and W. Feller, 
Fluctuations in coin tossing, Proceedings National Academy of Sciences USA, vol. 
35 (1949), pp. 605-608 (see also the first edition of the present book, chapter XII, 
problem 4). An elegant combinatorial proof was given by J. L. Hodges (see foot- 
note 3). 
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sum become Lg,_2Len—2r = Lep—2Lan—2 with p running fomk+1ton. Thus 


(2.8) Laza = D Lapsa 
F= 


which is independent of k. 

A path with all 2n sides above the z-axis is a path of the sort described in theorem 
2(b), and hence Len,2n = Len. For reasons of symmetry we have also Lon = Lan. 
The total number of paths from the origin to (2n, 0) being (n + 1)Len, it follows 
that Lorn = Len fork = 0,1, ..., n. 


As a corollary we find the identity 
(2.9) Lan = D Lay —2Tan—2y. 
Frs 


For a direct analytic verification see section 8(a). 


3. RANDOM WALKS AND COIN TOSSING 


In a sequence of N tossings of an ideal coin let e = +1 if the kth 
trial results in heads and e = —1 otherwise. Then s; = e + at 
+... + « is the cumulative excess of heads over tails at the conclusion 
of the kth trial. In classical betting language s+ is “Peter’s accumu- 
lated net gain.” Each possible outcome of the N successive tossings 
is represented by a path of N sides starting at the origin, and conversely 
each such path may be taken as representing the outcome of N tossings. 

This consideration leads us to take for our sample space the aggregate 
of the 2% paths {sı, ..., sw} starting at the origin and to attribute proba- 
bility 27” to each. 

An event such as “heads at the first two trials” must be interpreted 
as the aggregate of all sequences starting with sı = 1, sz = 2. There 
are 2~? such sequences and the probability of this event is therefore 
2~, as is proper. More generally, if Æ < N there exist exactly 2%—# 
different paths {s1, s2, ..., sy} such that their first k vertices lie on a 
preassigned path {s1, s2, ..., Sk}. It follows that an event determined 
by the outcome of the first k < N trials has a probability independent of 
N. In practice, therefore, the number N plays no role, provided it is 
sufficiently large. Conceptually and formally it is best to consider each 
finite sequence of tossings as the beginning of a potentially infinite se- 
quence, but this would lead us into non-denumerable sample spaces. 
We shall therefore consider finite sequences with N larger than the 
number of trials occurring in the formulas; except for this we shall be 
permitted, and be glad, to forget about N. k ; 

For the probabilistic background and the connection with related 
topics it is desirable to supplement the geometric language by an alter- 
native terminology. We imagine the coin tossings performed at a uni- 
form rate, so that the nth trial occurs at time n. Peter may mark his 
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cumulative gain at all times by an indicator which we shall call ‘“par- ~ 
ticle.” This particle, then, moves on a vertical axis starting from the | 
origin. It moves at times 1, 2, ... one unit step upward if the coin 
lands heads, one unit step downward if the coin lands tails. We say 
that the particle performs a symmetric random walk. (The physicist 
takes it as the simplest model for one-dimensional diffusion; see chap- 
ter XIV.) 

At time n the position of the particle is the point s, of the vertical af 
axis. The path {81, S2, ..-, SN} represents the space-time diagram of the 
random walk, the x-axis playing the role of the time axis. 

Guided by this background we introduce the following 


Terminology. We shall say that at time n there takes place: 
A return to the origin if Sn = 0. 
A first return to the origin if 


(3.1) 8,0, 820, ..-, S10, on = 0. 
_ A first passage through r > 0 if 
(3.2) BT Soh ony See ST Ser 


A second, third, ... return to the origin and a first passage through 
r < 0 are defined in an obvious way. Note that passages through the 
origin can take place only at even times, and we shall frequently restrict 
the formulas to even times. In betting language a return to the origin 
represents an equalization of the accumulated numbers of heads and tails. 
(Figure 3 exhibits two paths in which the first passages and returns to 
the origin, respectively, are marked; the second path has the peculiarity 
of keeping to the negative side.) 


4, REFORMULATION OF THE COMBINATORIAL THEOREMS 


In the following sections we shall use the notations 


aN 


(4.1) t= °") 272r, n=0,1,2, 
n 
and 
1 
(4.2) fo = 0, fan = inis n= 1,2, 
2n 


It is easily verified that 

(4.3) Jon = Uzn—2 — Wn, 
Theorem 1. For eachn > 1: 

(4.4) uon = P{sen = 0} 
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(4.5) uzn = P{s; = 0, 8. =Æ 0, «sey 827 Æ O} 


(4.6) Uon = P{s; > 0, s2 > 0, ..-, San > 0} 


or in words: The three events, (a) a return to the origin takes place at time 
2n, (b) no return occurs up to and including time 2n, and (c) the path 
is non-negative between 0 and 2n, have the common probability uzn- 


Furthermore, 
(4.7) Jan = P{si #0, s2 #0, ..., Sant #0, San = 0} 
(4.8) fon = P{s; > 9, 82 > 0, ..., Sone > 0, San—1 < 0} 


that is: the two events (a) the first return to the origin takes place at time 
2n, and (b) the first passage through —1 occurs at time 2n — 1, have the 
common probability fon- 


Proof. As was observed in section 3 it suffices to consider the sample. 

space of paths of the fixed length 2n. By (1.3) there exist Go paths 
n 
joining the origin to the point (2n, 0), and this proves (4.4). 

By theorem 2(a) in section 2 there exist Lən—2 paths joining the 
origin to (2n, 0) such.that sı > 0, .-., Sen > 0. Therefore there are 
twice as many paths satisfying the condition in (4.7), and the corre- 
sponding probability is QLen—2°2 °" = fan. Theorem 2(b) in section 2 
implies in the same way (4.8). 

The probability that no zero ^ccurs up to and including time 2n 
equals one minus the probability of a first return to the origin at a 


time <2n. Using (4.7) this difference is 
(4.9) T= ffi —---—San = 

1 — (1 — uz) — (uz — u4) —.. . -- (Uzn—2 — Uzn) = Ven 
which proves (4.5). Similarly, the right side in (4.6) equals one minus 


the probability of a first passage through —1 before time 2n, and using 
(4.8) this difference is again given by (4.9). This accomplishes the 


proof. 
Corollary. It follows that for n >1 


n 
(4.10) Ugn = D Jertlan—ar- 


r=l 
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Proof. If a return to the origin takes place at time 2n, then the first 
return must take place at some time 2r < 2n. We have just seen that 
the number of paths from the origin to (2n, 0) with. the first return to 
the origin teking place at time 2r < 2n equals Par 27 ti are 
Summing over 7, we get equation (4.10). (For a direct analytic proof 
see section 8(a). In chapter XIII, section 3, we shall see that (4.10) is 
a special case of the basic equation for recurrent events.) 


Theorem 1* in section 2 enumerates the paths in which e first passage through 
y occurs at time z. The sum z + y must be even, and for our purposes it is con- 
venient to put z = 2n — y. The content of theorem 1* may then be restated as 
follows. 


Theorem 2. The probability that a first passage through y > O takes place at 
time 2n — y is given by 


(4.11) {2 = ee En p 2an, n>y>0. 


The simplicity with which the duality principle delivered this important formula 
as a direct consequence of the ballot theorem is truly remarkable. A direct analytic 
derivation of (4.11) is difficult and requires special tricks. 

In principle, the probabilities f$? can be calculated by induction on y. A path 
of length 2n — y — 1 in which a first passage through y + 1 occurs at the terminal 
point may be decomposed into two segments (see figure 3 for y = 4). The first 
segment is the path from the origin up to the point of the first passage through y; 
it occurs at some time 2» — y < 2n — y — 1. This section is followed by the sec- 
ond, a section of length 2n — 2» — 1 in which the terminal endpoint is the only 
one lying above the left endpoint. In other words, if its left endpoint is taken as 
the origin, the second section represents a path with a first passage through 1 at 
the endpoint. By definition there exist 2”—%f{ sections of the first type and 
2?n—2r—-14{) 5, of the second, and any two can be combined to give & path with 
first passage through y + 1 at time 2n — y — 1. Therefore 


(4.12) ser =F 9», n>yth 


r=y 


Formula (4.8) states that a first passage through —1 (and hence also through +1) 
at time 2n — 1 has probability fon, that is, 


(4.13) JP = fmn n>1. 


Equations (4.12) and (4.13) determine recursively all f{#?, but it is not easy to verify 
that (4.11) satisfies (4.12), and it is not at‘all clear how the explicit formula (4.11) 
could be derived from (4.12). 


D 

Formulas (4.12)-(4.13) permit a novel conclusion. We see from (4.13) that fan 
is the probability that the first return to zero occurs at time 2n. Forgetting about 
the preceding theorem, let us now define f$? as the probability that the yth niam 
to zero takes place at time 2n. The argument used in the last proof applies wit i 
out change: Splitting a path from the origin to the (y+1)st return into the initia 
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section leading to the yth return and the terminal section between the yth and the 
(y+1)st return, we see again that (4.12) holds. Since this relation uniquely de- 
termines all f$? we have 

Theorem 3. The probability that the yth return to zero takes place at time 2n is 
given by (4.11). 

Alternative geometric proof. Consider a path leading from the origin to a first 
passage through y at time 2n — y. (Figure 3 exhibits the case y = 5, 2n — y = 15.) 
Construct a new path by inserting into this path y new sides each of slope —1 
and having left endpoints, respectively, at the origin and the y — 1 vertices at 
which a first passage through 1, 2, ..., y—1 takes place. The new path, say 
lon 02, ..., oa}, has length 2n. Clearly o1 <0, ..., c2n—1 <0, ozn = 0, and 
exactly y — 1 interior vertices lie on the x-axis. Conversely, each path {c1, ..., c2n} 
with this property is obtained, in the manner described, from a path with first 
passage through y at time 2n — y. If f$ is defined as in theorem 2, we see that 
there exist exactly 2°"-%f{? paths {o1, ..., can} such that c; <0, oz, = 0, and 
exactly y — 1 interior vertices lie on the z-axis. Such a path consists of y sections 
with endpoints on the z-axis, and we can produce 2” different paths by changing 
the signs of all e; of one or more such sections. In this way we obtain all paths of 
length 2n with sen = 0 and exactly y — 1 inner vertices on the z-axis, and their 
number is therefore 2°"/{), as asserted. 


5. PROBABILITY OF LONG LEADS: THE FIRST ARC SINE 
LAW 


We shall say that the particle spends the time from k — 1 to k on the 
positive side if the kth side of iis path lies above the z-axis, that is, if at 
least one of the two vertices szx—ı and s; is positive (in which case the 
other is positive or zero). In the betting terminology this means that 
at both the (k—1)st and the kth trial Peter’s accumulated gain was 


non-negative. 
The paradoxical properties of the paths mentioned in section 1 will 


be derived from the following 


Theorem 1. Let pox.on be the probability that in the time interval 
from 0 to 2n the particle spends 2k time units on the positive side and 
2n — 2k time units on the negative side. Then 


(5.1) P2k, 2n = U2kU2n—2k- 
(Note that the total time spent on the positive side is necessarily even.) 


Proof. The probability that the particle keeps to the positive side 
during the entire time interval from 0 to 2n is given by formula (4.6), 


y complicated analytical methods by K. L. Chung and W. Feller 
d the first edition of the present book, chapter XII, sections 5 
y the work of E. Sparre Andersen (see foot- 


6 First proved b; 
(see footnote 5 an 
and 6). The theorem was suggested b 


note 1). 
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and we see that Don,on = U2n as asserted. For reasons of symmetry 
we have also Po,2n = Ven, and it remains only to prove (5.1) for 
1<k<n-—1. For that purpose we repeat the argument which led 
to (2.7). A particle that keeps for 2k > 0 time units to the positive 
side and for 2n — 2k > 0 time units to the negative side necessarily 
passes through zero. Let 2r be the moment of its first return to zero. 
Then the path belongs to one of the following two classes. 

In the first class, up to time 2r the particle stays on the positive 
side, and during the time interval from 2r to 2n it spends exactly 
2k — 2r > 0 time units on the positive side. There exist 2?"f2, paths 
of length 2r which return to the origin for the first time at 2r, and half 
of them keep to the positive side. Furthermore, by definition, there 
are 2?"—*'po,_»,,en—2r paths of length 2n — 2r starting at (2r, 0) and 
having exactly 2k — 2r sides above the z-axis. Thus the total number 
of paths of length 2n in the first class equals 


F- fo, 2 pakor anor = far Pok—or,2n—2r 


In the second class, from 0 to 2r the particle keeps to the negative 
side, and between 2r and 2n it spends 2k time units on the positive side. 
Here 2k < 2n — 2r and the argument above shows that the number 
of paths in this class equals 27"—f,,po4 2n—2r- 

It follows that for 1 < k < n — 1 


n—k 


k 
(5.2) Pakan = $ D> SorPox—or.an—ar + FD SerPak.2n—2r- 
rel r=1 
Suppose now by induction that Pak, 2v = upetlo»—o¢ for yo1,2 ony 
n—1 (this relation being trivially true for » = 1). Then formula (5.2) 

reduces to i 


k ak 
(5.3) Poran = $ Uan—ox D Sortor—or + $ uar D fortien—zk—2e- 
rel rol 


In view of equation (4.10), the first sum equals uz and the second 
equals uz2n—2z and therefore (5.1) holds. 


We feel intuitively that the fraction k/n of the total time spent on the 
positive side is most likely to be close to 3. However, the opposite is 
true: The possible values close to 4 are least probable and the extreme 
values k/n = O and k/n = 1 have the greatest probability. This assertion 
can be verified using a ratio test on (5.1). 

Table 1 illustrates the paradox. In betting terminology it reveals 
the startling fact that in 2n = 20 tossings of a perfect coin with proba- 
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bility 0.3524 the less foriunate player will never be in the lead. In most 
cases (with probability 0.5379) the accumulated gain of the less fortu- 
nate player will be positive just once or never. By contrast, an equal 
division 10:10 of the leads has a probability of only 0.0606. 


TABLE 1 


DISTRIBUTION OF LEADS IN 20 Tosses oF a Corn 


k=0 |k=2 |k=4 |b=6 |k=8 |> i0 

k=20 | k=18 | k=16 | k=14 | k=12 | “5 
Pr.2 = 0.1762 | 0.0927 | 0.0736 | 0.0655 | 0.0617 | 0.0606 
Pinc 0.3524 | 0.5379 | 0.6851 | 0.8160 | 0.9394 1 


Pk,20 = Ukum is the probability that k sides of the path are above the axis, 
., “Peter leads during exactly k out of the 20 trials.” 
oe 2 is the probability that one of the players is in the lead for at least k 
trials, the other for at most 20 — k trials. 


Formula (5.1), although exact, is not very revealing, and it is pref- 
erable to replace it by a simpler approximation. An easy application 
of Stirling’s formula II(9.1) shows that usn(mn)} —> 1 as n > œ. 
[This is the content of problem II(12.20).] It follows that 


1 


(5.4) Pok.2n ~ aH Wi 


where the ratio of the two sides tends rapidly to unity as k — œ and 
n — k — œ. The probability that the fraction k/n of the time spent 
on the positive side lies between $ and a (4 < æ < 1) is given by 


(5.5) eS Eh sD 


jn <k <an TN in<k<an 


On' the right side we recognize the Riemann sum approximating the 
integral 


z d. 
(5.6) at is = 2r! arc sin at — 4- 
i {x(1 — aH 
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For reasons of symmetry the probability that k/n < 3 tends to 4 as 
n — œ. Adding this probability to (5.5), we get 

Theorem 2.7 (The first arc sine law.) For fixed a (0 <a < 1) and 


n — œ the probability that the fraction k/n of time spent on the positive 
side be <a tends to 


= dz b 
(5.7) mf ——_____-_ = 27" are sin al. 
o {z — 2)}# 
In practice formula (5.7) provides an excellent approximation even 


for values of n as small as 20. The integrand in (5.7) is represented 
by a U-shaped curve tending to infinity at the endpoints 0 and 1. This 


l 
| 
| 
| 
| 
| 
l 
+ 
S 


Fieure 4. The are sine law. 


shows in a striking fashion that the fraction of time spent on the posi- 
tive side is much more likely to be close to zero or to one than to the 
“expected” or “normal” value 3. Figure 4 will reveal: 


1 Paul Lévy (Sur certains processus stochastiques homogènes, Compositio M: one 
matica, vol. 7 (1939), pp. 283-339) found the are sine law for certain continuous 
diffusion processes and referred to the connection with the coin-tossing game. 
general arc sine law for the number of positive partial sums in & sequent Ön the 
tually independent random variables was proved by P. Erdös and M. fas arian 
number of positive sums of independent random variables, Bulletin of the nden 
Mathematical Society, vol. 53 (1947), pp. 1011-1020. It was E. Sparre re ae 
who discovered the combinatorial nature of the arc sine law and its val 1y 
general classes of random variables. 
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With probability 0.20 the particle stays for about 97.6 per cent of the 
time on the same side of the origin. In one out of 10 cases the particle 
will spend 99.4 per cent of the time on the same side. Another illustration 
is given in table 2. 

TABLE 2 


ILLUSTRATING THE Arc SINE Law 


P tp 

0.9 153.95 days 
8 126.10 days 
7 99.65 days 
6 75.23 days 
5 53.45 days 
4 34.85 days 
3 19.89 days 
3 8.93 days 
Fak 2.24 days 
05 _ 18.5 hours 
02 2.16 hours 
01 32.4 minutes 


A coin is tossed once per second for a total of 365 days; let Z be the fraction 
of time during which the less fortunate player is in the lead. Then ¢, is a 
number such that. the event Z < tp has probability p, approximately 


This table shows the probability p that the less fortunate player will 
be in the lead for a total of less than ¢, days of a full year. Using, for 
example, the significance level p = 0.05 dear to statisticians, we see 
that in one out of 20 cases the more fortunate player will be in the lead 
for more than 364 days and 10 hours. Few people will believe that a 
perfect coin will produce preposterous sequences in which no change 
of lead occurs for millions of trials in succession, and yet this is what 
a good coin will do rather regularly. 

In the next section we shall treat another aspect of the same phe- 
nomenon, and in section 7 we shall illustrate the theory by empirical 
material. 


6. THE NUMBER OF RETURNS TO THE ORIGIN 


The explanation of the arc sine law lies in the fact that frequently enormously 
many trials are required before the particle returns to the origin. Geometrically ` 
speaking, the path crosses the z-axis very rarely. 

We feel intuitively that if Peter and Paul toss a coin for a long time 2n, the 
number of ties (montents when the cumulative scores are equal) should be roughly 
Proportional to 2n. But this is not so. Actually the number of ties increases in 
probability only as (2n)}; that is, with increasing duration of the game the frequency 
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of ties decreases rapidly, and the “waves” increase in length. In analyzing this 
situation we shall consider the number of returns to zero. It should be borne in 
mind that the number of times when the particle actually crosses from the positive 
side into the negative or conversely is roughly one-half the number of returns. 

Theorem 1. Let z$) be the probability that up to and including time 2n the particle 
returns to zero exactly r times. Then 


1 2n — r L 
(6.1) B-aa(",’): n> 


In particular 2S) = 2h) = uz and 
(6.2) 2 = 22 > 2 > AP >... 


In words (6.2) states that, independently of the duration 2n of the game, it is more 
likely that no return or exactly one return to zero has occurred than any other number. 


Proof. We recall that by formulas (4.4) and (4.5) there exist exactly as many 
paths of length 2» with no return to zero as there are paths with a return to zero at 
the last step. Consider now paths of length 2n in which the rth and last return 
occurred at some time 2n — 2» < 2n. The section of length 2v starting at this last 
return can be chosen in as many ways as we can choose an alternative section 
starting at the same point (2n—2y, 0) of the z-axis and leading to (2n, 0). In ee 
words: The probability that exactly r returns to zero occur before time 2n equals the 
probability that a return occurs at time 2n and that it is preceded by at least r returns. 
By theorem 3 of section 4 this means that 


(6.3) AB = f+ At? + St? +... 
with fs given by (4.11). It is easily verified that 
1 Qn—y 1 Qn —y D 
Ma ae 
(6.4) M-a -ra ( n I 
and adding for y = r, r+1, ... we get equation (6.1) as asserted. The assertion 


(6.2) being a trivial consequence, the theorem is proved. 


It is again desirable to replace the exact formula (6.1) by a simpler approximic 
tion. For that purpose we rewrite (6.1) in the form 


(6.5) 5 i) (0-3-0-7) 


OE a N Se 
ae I 2 7-1 
(5 6-)..0- 
2n. 2n 2n 
> ©, 
As was pointed out in the proof of the arc sine law, we have uen(en)? Sa i: wae vJ 
From the Taylor expansion of the logarithm, II(8.10), we see that log 


p 2, It 
nn)’. 
may be approximated by —y/n with an error of the order of magnitude ( m 
follows that with an error of the magnitude 73/n? we have the appro: 


jtt 2 
— Jya = 
N, 


(6.6) log (zirin) = = = 


or 


(6.7) 2) = rnein, 


C, a n ma Siaa 
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The probability of fewer than k returns, namely 2$ + z$ +...-+ z$ P, is thus 
approximated by a Riemann sum to the integral over 71e?" extended from 0 to 
k/n, the relative {or percentage) error involved being of the order of magnitude 
k?/n?. We have thus 


Theorem 2. For each fixed a > 0 the probability that up to and including time 
2n the particle returns to the origin fewer than a(2n)! times tends as n — © to® 


(6.8) fla) = (2/x) f eia ds, 


In particular, the probability that there occur fewer than 0.6745(2n)! returns is, for 
large n, approximately }. 


In chapter VII, section 1, the reader will find a table of the normal distribution 
function (a) = ${1+/f(a)}; from it the values f(a) may be obtained using 
Hla) = 2{%(a) — $} fora > 0. 


Let a coin be tossed 10,000 times: with probability $ there will be fewer than 68 
returns to zero, of which only one-half represent actual changes of the lead. In 
other words, with probability } the mean duration of a “wave” between two con- 
secutive changes of lead is about 300. For 1,000,000 tossings the median number 
of returns has increased only by a factor 10, and the mean duration of a wave has 
increased to about 3000. The longer the series of trials, the rarer the returns to 
zero and the longer the waves. 

The probability that in 10,000 tossings of a coin the lead never changes is about 
0.0085, and with the same probability there will be fewer than 10 changes of lead 
in 1,000,000 tossings. 


7. AN EXPERIMENTAL ILLUSTRATION 


Figure 5 represents the result of an experiment simulating 10,000 
tosses of a coin; it is the material tabulated in example I(6.c). The 
top line contains the graph of the first 550 trials, and the next two 
lines represent the entire record of 10,000 trials on a smaller scale in 
the x-direction. The scale in the y-direction is the same on the two 
graphs. 

When looking at the graph most people feel surprised by the length 
of the waves between successive crossings of the z-axis (i.e., successive 
changes of lead). Nevertheless, the graph represents a comparatively 
mild case history and was chosen as the mildest among three available 
records. The reader is asked to look at the same gruph in the reverse 
direction, that is, to take the terminal point as origin. [Analytically, 
the reversed path is given by (2.2).] Theoretically, the series as 
graphed and the reversed series are equivalent, and each represents a 


8 Readers acquainted with the central limit theorem are warned that the num- 
ber of returns is not normally distributed. In (6.8) there appears a truncated nor- 
mal distribution with mean (2/7) and variance 1 — 2/r. 
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random walk. The reversed random walk has the following charac- 
teristics. Starting from the origin 


the “particle” stays at the 


negative side positive side 

first 7804 steps next 8 steps 

next 2 steps next 54 steps 

next 30 steps next 2 steps 

next 48 steps next 6 steps 

next 2046 steps 

total of 9930 steps total of 70 steps 
fraction of time: 0.9930 fraction of time: 0.007 


This looks absurd, and yet the probability that in 10,000 tosses of a 
perfect coin the lead is on one side for more than 9930 trials and at 
the other for fewer than 70 trials is slightly greater than 0.1. In other 
words, on the average more than one record out of ten will look worse 
than the one just described. By contrast, the probability of a record 
showing a better balance of leads than that of figure 5 is smaller, 
namely about 0.072. 

The record of figure 5 contains 142 returns to the origin among which 
there are 78 actual changes of lead. The reversed series described 
above contains 14 returns of which 8 are changes of lead. Sampling 
of expert opinion has revealed that even trained statisticians feel that 
142 equalizations in 10,000 tosses of a coin is a surprisingly small num- 
ber, and 14 appears quite out of bounds. Actually the probability of 
more than 140 equalizations is about 0.157 while the probability of fewer 
than 14 equalizations is about 0.115. Thus, contrary to intuition, find- 
ing only 14 equalizations is not surprising at all; as far as the number 
of changes of lead is concerned, the'reversed series stands on a par with 
the original series of figure 5. 


8. MISCELLANEOUS COMPLEMENTS 
(a) Analytical Verification of Identities 
it is easily verified that 


(8.1) usn = (—1)" A: fon = (1H 8) 


n 


The basic identity (4.10) can now be regarded as a special case of equa- 
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tion II(12.9) for a 


that > UzrUzn—2r 
r=0 


4,b = —4. The same formula shows in addition 


1. 


Formula (2.8) may be rewritten in terms of fo, instead of Loz+2 
and reduces to the special case of II(12.9) fora = b = $. Alter- 
natively, formula (2.8) may be derived from (4.10) using the identity 
a(n =r) = r + (n — r). 


(b) The Position of the Maxima: The Second Arc Sine Law 


We shall say that the path {s1, so, ..., 82} has its first maximum at 
the place k if 


(8.2) 8 >0, 8 > 8, see) 8k > Ska, Sk > 841, cory Sk Z See 


In particular, the first maximum is at the place 0 if s; < Oforl <j < z. 
By formula (4.6) the probability that a path of length z = 2n has its 
first maximum at 0 equals uzn. It follows that also for a path of length 
xz = 2n — 1 the probability of the first maximum at 0 equals wan. 

The event “first maximum at the last place” is the same as sj < 8z 
forj = 0,1,...,2—1. For the reversed path (2.2) this means s:* > 0, 
82* > 0, ..., 8:* > 0, and the probability of this is given by (4.5), 
namely usn for z = 2n and also for z = 2n + 1. 

A path of length 2n with a first maximum at k consists of two sec- 
tions: The initial section has its first maximum at the last, or kth, 
place, and the second section has its first maximum at the initial, or 
zero-th, place. Conversely, any two sections with the stated proper- 
ties may be combined to give a path with its first maximum at the kth 
place. We have thus the 


Theorem. The probability that a path of length 2n has its first maxi- 
mum at the place v equals 
if v= 2k (= 1,2, ea) 


(8.3) Fuku 
EREE oe aak kellet el 


and w if v = 0. 


The remarkable fact is that the probability of finding the first mazi- 
mum at either 2k or 2k + 1 equals the probability pzr,2n in (5-1) that the 
particle spends 2k out of 2n time units on the positive side. It follows 
that the arc sine approximation applies and we can conclude that there 
is a strong tendency for the mazima to occur near one or the other of the 
endpoints. \ 

The surprising circumstance that the probability distribu 
{P2k.2n} of leads and the distribution of the position of the maxima are 


tion 
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practically the same is no peculiarity of the coin-tossing game. An 
analogous theorem has been proved by E. Sparre Andersen for a large 
class of random variables, and the combinatorial basis of his proof is 
similar to the argument used above. 


(c) A Limit Theorem for First Passages and Returns to the 
Origin ° 
The estimates used in section 6 may be used to show that for fixed y > 0 the 
probability {$2 of (4.11) satisfies the asymptotic relation 


ai = ne 
(8.4) J~ () an eT WV 0n- 


the sign ~ indicating that the ratio of the two sides tends to unity as n > ~. 
The methods employed for the limit theorems in sections 5 and 6 now lead to the 
following conclusion: The probability that the yth return to zero (or the first passage 
through y) takes place before time ty? tends, with increasing y, to 1 — S(t) with fle) 
defined in equation (6.8). 

It follows that with probability near $ the yth return to zero will occur after time 
(2.21...)y?, so that the average time between consecutive returns is bound to increase 
roughly linearly with y. This should come as a surprise to physicists accustomed 
to taking the average of y “measurements on the same quantity” as approximation 
to the “true” value. In the present case a closer analysis reveals that in all likeli- 
hood one among the y measurements will be of the same order of magnitude as the 


whole sum, namely y°. 


° This is theorem 3 of chapter XII, section 5, in the first edition. Advanced 
pei are advised that 1 — f(t) is the so-called positive stable distribution of 
order 3. 


CHAPTER TV" 


Combination of Events 


This chapter is concerned with events which are defined in terms 
of certain other events A;, As, ..., Aw. Thus in bridge the event A, 
“at least one player has a complete suit,” is the union of the four 
events Ar, “player number k has a complete suit” (k = 1, 2, 3, 4). 
Of the events A; one, two, or more can occur simultaneously, and, 
because of this overlap, the probability of A is not the sum of the four 
probabilities P{A,}. Given a set of events Ay, ..., Ay, we shall 
show how to compute the probabilities that 0, 1, 2, 3, ... among them 
occur. 

The material of this chapter is covered in a monograph by M. 
Fréchet,! to which the reader is referred for further information. 


1. UNION OF EVENTS 


( If A; and A, are two events, then A = A, U Ap denotes the event 
that either A; or Az or both occur. _ By formula I (7.4) we have 


(1.1) P{A} = P{Ay} + P{A,} — P{A,Ag}. 


We want to generalize this formula to the case of N events Ay, As, «++ 
An; that is, we wish to compute the probability of the event that at 
least one among the A, occurs. In symbols this event 18 
A=A,U4,U...U Ay. For our purpose it is not sufficient to 
know the probabilities of the individual events Ar, but we must be 
given complete information concerning all possible overlaps. This 
means that for every pair (i,j), every triple (i,j, k), etc., we must 
know the probability of A; and A;, or A;, Aj, and Aj, etc., occurring 
simultaneously. For convenience of notation we shall denote these 


* The material of this chapter will not be used explicitly in the sequel. Only 
the first theorem is of considerable importance. k dipai 
‘Les probabilités associées à un système d'événements compatibles e s E 
dants, Actualités scientifiques et industrielles, nos. 859 and 942, Paris, 1940 and 1° 
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probabilities by the letter p with appropriate subscripts. Thus 
(1.2) pp=P{As}, pij =P{AAj}, Pije = P{AsAjAr}, .... 


The order of the subscripts is irrelevant, but for uniqueness we shall 
always write the subscripts in increasing order; thus, we write p3,7,11 and 
not p7,3,11- Two subscripts are never equal. For the sum of all p’s 
with r subscripts we shall write S,, that is, we define 


(1.3) Sy = Zp;, Se = Epi, j S3 = ZDi,j,.k) .... 

Here i <j < k<... < N, so that in the sums each combination ap- 

pears once and only once; hence S, has (") terms. The last sum, Sy, 
r 


reduces to the single term 7;,2,3,...,7, Which is the probability of the 
simultaneous realization of all N events. For N = 2 we have only the 
two terms S; and Sg, and formula (1.1) can be written 


(1.4) P{A} = S; — S2. 
The generalization to an arbitrary number N of events is given in the 
following 

Theorem. The probability P, of the realization of at least one among 
the events Ay, Ag, ..., Aw is given by 
(1.5) Pi = Sı — S2 + S3 — S4 + —..-+ Sy. 

Proof. We prove (1.5) by the so-called method of inclusion and ex- 
clusion (cf. problem 26). To compute P; we should add the proba- 
bilities of all sample points which are contained in at least one of the A;, 
but each point should be taken only once. To proceed systematically 
we first take the points which are contained in only one A;, then those 
contained in exactly two events A;, and so forth, and finally the points 
(if any) contained in all A;. Now let E be any sample point contained 
in exactly n among our N events A;. Without loss of generality we 
may number the events so that E is contained in Aj, Ao, ..., An but 


not contained in An4i, Ans, ---,An- Then P{£} appears as a contri- 
bution to those Pi, Pij, Pijk, --- Whose subscripts range from 1 to n. 


. . n . 
Hence P{Z} appears n times as a contribution to Sı, and () times as 


a contribution to Se, etc. In all, when the right-hand side of (1.5) is 
expressed in terms of the probabilities of sample points we find P{E} 


with the factor 


an O 
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To prove the theorem we have to show that this number equals 1. 
This follows at once on comparing (1.6) with the binomial expansion 
of (1 — 1)" [ef. formula II(8.7)]. The latter starts with 1, and the 
terms of (1.6) follow with reversed sign. Hence for every n > 1 the 
expression (1.6) equals 1, and this proves the theorem. 


Examples. (a) In a game of bridge let A; be the event “player 

> P 52 
number 7 has a complete suit.” Then p; = a(o) ; the event that 
both player ¿i and player j have complete suits can occur in 4-3 ways 


is 52\ /39 
and has probability p;,; = 12/(*,) (a) ; similarly we find 


pisa = 2473) (ia) Ga) 


Finally, p1,2,3,4 = 1,2,3, since whenever three players have a complete 
suit so does the fourth. The probability that some player has a com- 
plete suit is therefore P, = 4p; — 6p1,2 + 4p1,2,3 — P1,2,3,4° Using 
Stirling’s formula, we see that Pı = 4 10~!° approximately. In this 
particular case P; is very nearly the sum of the probabilities of Aj, but 
this is the exception rather than the rule. 

(b) Matches (coincidences). The following problem with many vari- 
ants and a surprising solution goes back to Montmort (1708). It has 
uv been generalized by Laplace and many other authors. 

Two equivalent decks of N different cards each are put into random 
order and matched against each other. If a card occupies the same 
place in both decks, we speak of a match (coincidence or rencontre). 
Matches may occur at any of the N places and at several places simul- 
taneously. This experiment may be described in more amusing forms. 
For example, the two decks may be represented by a set of N letters 
and their envelopes, and a capricious secretary may perform the random 
matching. Alternatively we may imagine the hats in a checkroom 
mixed and distributed at random to the guests. A match occurs if a 
person gets his own hat. It is instructive to venture guesses as tO how 
the probability of a match depends on N: How does the probability of 
a match of hats in a diner with 8 guests compare with the correspo nd 
ing probability at a gathering of 10,000 people? It seems surprising 
that the probability is practically independent of N and roughly 3- 
(For less frivolous applications cf. problems 10 and 11.) : 

The probabilities of having exactly 0, 1, 2, 3, .-- matches will be 
calculated in section 4. Here we shall derive only the probability Pı of 
at least 1 match. For simplicity of expression let us renumber the 
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cards 1, 2, ..., N in such a way that one deck appears in its natural 
order, and assume that each permutation of the second deck has 
probability 1/N! Let A; be the event that a match accurs at the 
kth place. This means that card number k is at the kth place, and 
the remaining N — 1 cards may be in an arbitrary order. Clearly 
pk = (N — 1)!/N! = 1/N. Similarly, for every combination 7, j, we 
have p;,; = (N — 2)!/N! = 1/N(N — 1), ete. The sum S, contains 


E terms, each of which equals (N — 7)!/N!. Hence S, = 1/r!, and 
5 
from (1.5) we find the required probability to be 


(1.7) P,=1 : : 1 
; 1= at ape ae 


Note that 1 — P, represents the first N + 1 terms in the expansion 
1.8 hen {ee 
(1.8) e = 21 3! a! Peso 


Therefore we have with a good approximation 
(1.9) P, = 1 —e™ = 0.63212... 


The degree of approximation is shown in the following table of correct 
values of Pı: 


N= 3 4 5 6 7 
P, = 0.66667 0.62500 0.63333 0.63196 0.63214 


2. APPLICATION TO THE CLASSICAL OCCUPANCY 
PROBLEM 


We now return to the problem of a random distribution of r balls in 
n cells, assuming that each arrangement has probability n”. We seek 
the probability pm(r, n) of finding exactly m cells empty.” 

Let Ax be the event that cell number k is empty (k = 1, 2, ..-, n). 
In this event all r balls are placed in the remaining n — 1 cells, and 
this can be done in (n — 1)" different ways. Similarly, there are 
(n — 2)" arrangements, leaving two preassigned cells empty, etc. 


Accordingly 


(2.1) p= ( - DE Pi = (1 - Fy, ign * ( 2 N... 


pe a 
2 This probability has been derived, by an entirely different method, in problem 
II (11.8). Compare also the example in section 3. 
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and hence for every v < n 


on OEI 


The probability that at least one cell is empty is given by (1.5), and 
hence the probability that all cells are occupied is 1 — Sı + S2 — +... 
or 


(2.3) nan Sty (") (1 = zy. 


»=0 y, 
Consider now a distribution in which exactly m cells are empty. These 
m cells can be chosen in (") ways. Ther balls are distributed among 
m 


the remaining n — m cells so that each of these cells is occupied; the 
number of such distributions is (n — m)”po(r, n—m). Dividing by n” 
we find for the probability that exactly m cells remain empty 


(2.4)  Pmlr, n) = (9 ( = ZY mt, n—m) = 


A S Gs. 
MS v=0 y n 
We have already used the model of r random digits to illustrate the 
random distribution of r things in n = 10 cells. Empty cells corre- 
spond in this case to missing digits: if m cells are empty, 10 — m dif- 
ferent digits appear in the given sequence. Table 1 provides a nu- 
merical illustration. 


TABLE 1 
Prosaniıurres pPm(r, 10) acconvine TO (2.4) 

m r=10 r=18 

0 0.000 363 0.134 673 
1 .016 330 385 289 
2 136 080 .342 987 
3 355 622 119 425 
4 345 144 016 736 
5 -128 596 .000 876 
6 .017 189 .000 014 
7 .000 672 .000 000 
8 .000 005 .000 000 
9 -000 000 .000 000 


te i t 
Pm(r, 10) is the probability that exactly m of the digits 0, 1, -- -> 9 will no! 


appear in a sequence of r random digits. 
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It is clear that a direct numerical evaluation of (2.4) is limited to 
the case of relatively small n andr. On the other hand, the occupancy 
problem is of particular interest when 7 is large. If 10,000 balls are 
distributed in 1000 cells, is there any chance of finding an empty cell? 
In a group of 2000 people, is there any chance of finding a day in the 
year which is not a birthday? Fortunately, questions of this kind can 
be answered by means of a remarkably simple approximation with an 
error which tends to zero as n —> œ. This approximation and the 
argument leading to it are typical of many limit theorems in probability. 

Our purpose, then, is to discuss the limiting form of the formula (2.4) 
asn —> oandr — œ. The relation between r and n is, in principle, 
arbitrary. However, the ratio r/n represents the average number of 
things per cell. If it is excessively large, then we cannot expect any 
empty cells; in this case po(r, n) is near unity and all Pm(r, n) with 
m > 1 are small. On the other hand, if r/n tends to zero, then prac- 
tically all cells must be empty, and in this case p,(r, n) — 0 for every 
fixed m. Therefore only the intermediate case is of real interest. 

We begin by estimating the quantity S, of formula (2.2). Since 
(n — v} < (n), < w, we have clearly 


y+ y 
(2.5) n ( - *) <S <n (1 - z . 
N n 


Using the double inequality II(8.12) with t = v/n, we get 
(2.6) {ne PPI} < WIS, < { neWt!my>, 
Now put for abbreviation 

(2.7) new!” = 


and suppose that r and n increase in such a way that À remains bounded. 
Then, for each fixed v, the ratio of the extreme members in (2.6) tends 
to unity, and we conclude that 


Vv 1 
(2.8) fe a and —wv -S, — 0. 
vl vl 
It follows that 
te © Az 
(2.9) po(r,n) E Ea 0 
v=0 vt 


or polr, n) — e> — 0. Now the factor of po(r,n—m) in (2.4) may 
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be rewritten as S,,, and we have therefore for each fixed m 


m 


X 
(2.10) Dm(r, n) — e> — — 0. 
m! 


This completes the proof of the 


Theorem.’ If n and r tend to infinity so that à = ne™™!” remains 
bounded, then (2.10) holds for each fixed m. 


The approximating expressions 


(2.11) p(m;) = e™ Ar 
m! 


define the so-called Poisson distribution, which is of great importance 
and describes a variety of phenomena; it will be studied in chapter VI. 

In practice we may use p(m; A) as an approximation whenever 7 is 
great. For moderate values of n an estimate of the error is required, 
but we shall not enter into it. 


Examples. (a) Table 2 gives the approximate probabilities of find- 
ing m cells empty when the number of cells is 1000 and the number of 
balls varies from 5000 to 9000. For r = 5000 the median value of the 
number of empty cells is six: seven or more empty cells are about as 
probable as six or fewer. Even with 9000 balls in 1000 cells we have 
about one chance in nine to find an empty cell. 

(b) In birthday statistics [example II(3.d)] n = 365, and r is the 
number of peuple. For r = 1900 we find à = 2, approximately. In a 
village of 1900 people the probabilities Pim of finding m days of the year 
which are not birthdays are approximately as follows: 


Pio) = 0.135, Pum = 0.271, Pi = 0.271, Pia, = 0.180, 
Pia) = 0.090, Pis = 0.036, Pig) = 0.012, Piz = 0.003. 


The probability of finding exactly m cells each containing exactly K 
balls can be derived in the same way. As von Mises has shown, this 
probability can again be approximated by the Poisson expression (2. 11 
only this time à must be defined by 


k 
(2.12) kaperi () Jk! 
N. 


3 Due (with a different proof) to R. von Mises, Über Aufteilungs- und Beeson 
wahrscheinlichkeiten, Revue de la Faculté des Sciences de l'Université d'Istanbul, 
N.S., vol. 4 (1939), pp. 145-163. 
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3. THE REALIZATION OF m AMONG N EVENTS 
The theorem of section 1 can be strengthened as follows. 


Theorem. For any integer m with 1 < m < N the probability Pim] 
that exactly m among the N events Ay, ..., An occur simultaneously is 
given by 

m+1 
(3.1) Pim) = Sa — ( E ) Sm + 


2 
+("" ) Sasa = +. (7) Se. 
m M, 


Note: According to (1.5), the probability Pio) that none among the 
A; occurs is 


(3.2) Py = 1 —P,=1—S, + So — Sg... Sy. 


This shows that (3.1) gives the correct value also for m = 0 provided 
we put Sp = 1. 


Proof. We proceed as in the proof of (1.5). Let E be an arbitrary 
sample point, and suppose that it is contained in exactly n among the 
N events A;. Then P{E} appears as a contribution to Pim} only if 
n =m. To investigate how P{Z} contributes to the right side of (3.1), 
note that P{Z} appears in the sums Si, So, ..., Sa but not in Sn41, 
-+-,Sy. It follows that P{Z} does not contribute to the right side in 
(3.1) ifn <m. Ifn =m, then P{E} appears in one and only one 
term of Sm. To complete the proof of the theorem it remains to show 
that for n > m the contributions of P{Z } to the terms Sm, Sm-1y -++ Sr 
on the right in (3.1) cancel. Now out of the n events containing Æ we 


z n 
can form C) k-tuplets; hence P{Z} appears in S+ with the factor (;) ‘ 


For n > m the total contribution of P{Z} to the right side in (3-1) is 
therefore 


i a lie) ai A T 


However, i i ‘) ( i ) = (") (= š , and hence (3.3) re- 
m m +r. m, \ v 


duces to 


ea ETE T GE 
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Within the braces we have the binomial expansion of (1 — 1)""™ so 
that (3.3) vanishes, as asserted. 

Example. The reader is asked to verify that a substitution from 
formula (2.2) into (3.1) leads direcily to formula (2.4). 

4, APPLICATION TO MATCHING AND GUESSING 

In example (1.b) we considered the matching of two decks of cards 
and found that S; = 1/k!. Substituting into (3.1), we find the follow- 
ing result. 

In a random matching of two equivalent decks of N distinct cards the 
probability Pim) of having exactly m matches is given by 


1 1 1 1 
Po = 1 ul == —... + F 
o Tar Sp (N — 2)! W-D N! 
4 P 1 PpS + Æ F . 
Sy Rw’ 2! 3! “VW _—2)! “VD! 
P Af TE Ly + F : | 
asgl ora a We SG 
P fi 1 7+ + : } 
Tgi Ta 3 —— (N=! 


1 1 
Pineal amit! “145 


1 1 
et HA} Se Pip = 
Pini w-wh! { } im = 
The last relation is obvious. The vanishing of Piy_1) expresses the 
impossibility of having N — 1 matches without having all N cards in 


the same order. J Ps. | 
The braces on the right in (4.1) contain the initial terms of the expan- 


sion of e—. For large N we have therefore approximately 


LS 
(4.2) Pim ~ aCe 
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TABLE 3 


PROBABILITIES OF m Correct GurssEs IN CALLING a Deck or N 
Distincr CARDS 


N=3 N=4 N=5 N=6 N =10 


Pm 


|__—_—_—_ 


Ptmy Om | Pty be | Pi Om | Pimp bm | Pia bm 


0 | 0.333 0.296 | 0.375 0.316 | 0.367 0.328 | 0.368 0.335 | 0.36788 0.34868 | 0.367879 
1| .500 .444| 333 .422] .375 .410] .367 .402| .36788 .38742 | .367879 
2| ... 222] .250 .211| .167 .205| .187 .201| .18394 .19371| .183940 
3| .167 .037| ... .047| .083 .051| .056 .053| .06131 .061313 
4 042 004] ... .006] .021 .008] .01534 015328 
5 008 .000] ... .001| .00306 ‘ .003066 
6 001 .000} .00052 . .000511 
7 00007 000073 
8 00001 .000009 
Oh) eg | ce [err .000001 
TOW TY. eve: staes ae EE .000000 


The Pim) are given by (4.1), the bm by (4.4). The last column gives the Poisson 
limits (4.3). 


In table 3 the columns headed Pim) give the exact values of Pim) for 
N = 3, 4, 5,6, 10. The last column gives the limiting values 

-1 
(4.3) men, 

m! 
The approximation of Pm to Pim) is rather good even for moderate 
values of N. 

For the numbers pm defined by (4.3) we have Zp = & (1 pi 

rpreted 


1 1 P 
te z1 H 3 +...) = ee = 1. Accordingly, the p may be inte 
: . 1 of the 


as probabilities. Note that (4.3) represents the special case i= 
Poisson distribution (2.11). : i 
Formulas (4.1) are useful in testing guessing abilities. In wine tas 
ing, psychic experiments, etc., the subject is asked to call an ankn oia 
order of N things, say, cards. Any actual insight on the part of ae 
subject will appear as a departure from randomness. To judge od 
amount of insight we must appraise the probability of turns of g0 


a 
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luck. Now chance guesses can be made according to several systems 
among which we mention three extreme possibilities. (1) The subject 
sticks to one card and keeps calling it. With this system he is sure to 
have one, and only one, correct guess in each series; chance fluctuations 
are eliminated. (2) The subject calls each card once so that each series 
of N guesses corresponds to a rearrangement of the deck. If this sys- 
tem is applied without insight, formulas (4.1) should apply. (8) A 
third possibility is that N guesses are made absolutely independently 
of each other. There are NY possible arrangements. It is true that 
every person has fixed mental habits and is prone to call certain pat- 
terns more frequently than others, but in first approximation we may 
assume all NY arrangements to be equally probable. Since m correct 


and N — m incorrect guesses can be arranged in (N — 1% 
m., 


different ways, the probability of exactly m correct guesses is now 


4.4 b -QA 

(4.4) n= (le 

[This is a special case of the binomial distribution and has been derived 
in example II(4.c).] 


Table 3 gives a comparison of the probabilities of success when 
guesses are made in accordance with system (2) or (8). To judge the 
merits of the two methods we require the theory of mean values and 
probable fluctuations. It turns out that the average number of correct 
chance guesses is one under all systems; the chance fluctuations are 
somewhat larger under system (2) than (3). A glance at table 3 will 
show that in practice the differences will not be excessive. 


5. MISCELLANY 


(a) The Realization of at Least m Events 


With the notations of section 3 the probability Pm that m or more of 
the events Ay, ..., An occur simultaneously is given by 


(5.1) Pm = Pim + Pima +---+ Pim- 


To find a formula for P,, in terms of S; it is simplest to proceed by 
induction, starting with formula (1.5) and using the recurrence relation 


P41 = Pm — Pim. We get for m 21 
(5.2) Pm = Sm — ( 


Sm + 
m= :) v 


1 m+ 2 N =T 
+("* Jsa- ( ) Sas too ( ) sw. 
a mk m—1 


m 
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It is also possible to derive (5.2) directly, using the argument which 
led to (3.1). 


(b) Further Identities 


The coefficients S, can be expressed in terms of either Puy or Pp as 
follows 


N k 
(5.3) S, = x ( ) Pa 
y, 


k= 
and 

X /k—1 
(5.4) S= ( ) Py. 

kav \V — 1 


Indication of proof. For given values of Pim) the equations (3.1) 
may be taken as linear equations in the unknowns S,, and we have to 
prove that (5.3) represents the unique solution. If (5.3) is introduced 
into the expression (3.1) for Pim, the coefficient of Pyy(m < k < N) 
to the right is found to be 


er Zcr=()()-(QEcr-C2D: 


If k = m this expression reduces to 1. If k > m the sum is the binomial 
expansion of (1 — 1)*~™ and therefore vanishes. Hence the substitu- 
tion (5.3) reduces (3.1) to the identity Pim) = Pim- The uniqueness 
of the solution of (3.1) follows from the fact that each equation intro- 
duces only one new unknown, so that the S, can be computed recur- 
sively. The truth of (5.4) can be proved in a similar way. 


(e) Bonferroni’s Inequalities 

A string of inequalities both for Pim and for P, can be obtained in 
the following way. If in either (3.1) or (5.2) only the terms involving 
Sms Smis +++) Smr—a are retained while the terms involving Smtr 
Smirti, «++, Sw are dropped, then the error (i.e., true value minus ap- 
proximation) has the sign of the first omitted term (namely, (—1)'] and ts 
smaller in absolute value. Thus, for r = 1 and r = 2: 


(5.6) Sm — (m + 1)Sm41 < Pim < Sm 


(5.7) Sa imina PHS Seas 
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Indication of proof. To prove the statement for (3.1) it must be 
shown that 


(58) Eoy C)s>o 


vot 


for every t. Now use (5.3) to write the left side as a linear combination 
of the P. Fort < k < N the coefficient of Pix equals 


tor Q0-Oder(29 


vent 


k—m-—1 


The last sum equals ( ) and is therefore positive [problem 


t—m-—1 
II(12.13)]. “For further inequalities the reader is referred to Fréchet’s 
monograph cited at the beginning of the chapter. 


6. PROBLEMS FOR SOLUTION 
Note: Assume in each case that all possible arrangements have the same probability. 


1. Ten pairs of shoes are in a closet. Four shoes are selected at random. 
Find the probability that there will be at least one pair among the four shoes 
selected. 

2. Five dice are thrown. Find the probability that at least three of them 
show the same face. (Verify by the methods of chapter II, section 5.) 

3. Find the probability that in five tossings a coin falls heads at least three 
times in succession. 

4. Solve problem 3 for a head-run of at least length five in ten tossings. 

5. Solve problems 3 and 4 for ace runs when a die is used instead of a coin. 

6. Two dice are thrown r times. Find the probability p, that each of the 
six combinations (1, 1), ..., (6, 6) appears at least once. 

7. Quadruples in a bridge hand. By a quadruple we shall understand four 
cards of the same face value, so that a bridge hand of thirteen cards may con- 
tain 0, 1, 2, or 3 quadruples. Calculate the corresponding probabilities. 

8. Sampling with replacement. A sample of size r is taken from a popula- 
tion of n people. Find the probability ur that N given people will all be in- 
cluded in the sample. [This is problem II(11.12).] 

9. Sampling without replacement. Answer problem 8 for this case and show 
that 8 holds with ur > pY. (This is problem II(11.3), but the present method 
leads to an entirely different formula.) 

10. In the general expansion of a determinant of order N the number of 
terms containing one or more diagonal elements is N!P, with Pı defined by (1.7). 

11. The number of ways in which 8 rooks can be placed on a chessboard 
so that none can take another and that none stands on the white diagonal is 
81(1 — Pı), where Pi is defined by (1.7) with N = 8. 
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12. A sampling (coupon collector’s) problem. A pack of cards consists of » 
identical series, each containing n cards numbered 1, 2, ..., n. A random 
sample of r > n cards is drawn from the pack without: replacement. Calcu- 
late the probability u; that each number is represented in the sample. (Applied 
to a deck of bridge cards we get for s = 4, n = 13 the probability that a hand 
of r cards contains all 13 values; and for s = 13, n = 4 we get the probability 
that all four suits are represented.) 

13. Continuation. Show that as s — œ one has u, — polr, n) where the 
latter expression is defined in (2.3). This means that in the limit our sampling 
becomes random sampling with replacement from the population of the num- 
Dersi], 2, ...., 7. 


14. Continuation. From the result of problem 12 conclude that 
a ap fit _ B 
Zk (p) (ns = ta) = 0 


ifr < n and forr =n 


È (—1} G) (ns — ks)n = s"n!. 


k=O 
Verify this by evaluating the rth derivative, at z = 0, of 


1 
Gane ll -= (1 — z)'}”. 


15. In the sampling problem 12 find the probability that it will take exactly 
r drawings to get a sample containing all numbers. Pass to the limit ass > - 


16. A cell contains N chromosomes, between any two of which an eee 
r 
of parts may occur. If r interchanges occur (which can happen in C) 


distinct ways), find ihe probability that exactly m chromosomes will be in- 
volved.‘ 

17. Find the probability that exactly & suits will be missing in'a poker hand. 

18. Find ‘he probability that a hand of thirteen bridge cards contains the 
ace-king pairs of exactly k suits. 

19. Multiple matching. Two similar decks of N distinct cards aie id 
matched simultaneously against a similar target deck. Find the probe i 
Um of having exactly m double matches. Show that uo — 1as N — © (whic 
implies that um — 0 for m > 1). ified 

20. Multiple matching. The procedure of the preceding problem is a git N 
as follows. Out of the 2N cards N are chosen at random, and only t Prove 
are matched against the target deck. Find the probability of no match. Er 
that it tends to 1/e as N œ. n d of 

21. Multiple matching. Answer problem 20 if r decks are used instea 
two. 


pes of 


‘For N = 6 see D. G. Catcheside, D. E. Lea, and J. M. Thoday, TY: w 


a ia mi 
chromosome structural change introduced by the irradiation of tradescantia m 
spores, Journal of Genetics, vol. 47 (1945-46), pp. 113-149. 


——————— 
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22. In the classical occupancy problem, the probability Pjmj;(*) of finding 
exactly m cells occupied by exactly k things is 
(Dnt! i ‘(ar 

mw y O Gaye — De — OE 


the summation extending over those j > m for which j < n and kj <r. 
23. Prove the last statement of section 2 for the case k = 1. 
24. Using (3.1), derive the probability of finding exactly m empty cells in 
the case of Bose-Einstein statistics. 
25. Verify that the formula obtained in 24 checks with formula TI(11.14). 
26. Prove formula (1.5) by induction on N. 


Pim (k) = 


CHAPTER V 


Conditional Probability. 
Stochastic Independence 


1. CONDITIONAL PROBABILITY 


The notion of conditional probability is a basic tool of probability 
theory, and it is unfortunate that its great simplicity is somewhat ob- 
scured by a singularly clumsy terminology. The following considera- 
tions lead in a natural way to the formal definition. 


Preparatory Examples 


Suppose a population of N people includes N4 colorblind people and 
Nu females. Let the events that a person chosen at random is color- 
blind and a female be A and H, respectively. Then (ef. the definition 
of random choice, chapter II, section 2) 

N A N. H 
(1.1) P{A} =—, P{H} = —. 
{A} N {H} N 


Instead of the entire population, we may investigate the female sub- 
population and require the probability that a female chosen at random 
be colorblind. This probability is Nyz4/Ny, where Niza is the number 
of colorblind females. We have here no new notion, but we need anew 
notation to designate which particular subpopulation is under ni 
gation. The most widely adopted symbol is P{A |H}; it may be ve 
“the probability of the event A (colorblindness), assuming the event 
(that the person chosen is female).” In symbols: 


Nur _ P{AH} 


(1.2) P{A|H} = i T 


x jon 
Obviously every subpopulation may be considered as a a a 
in its own right; we speak of a subpopulation merely for conven 
104 


a 
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of language to indicate that we have a larger population in the back of 
our minds. An insurance company may be interested in the frequency 
of damages of a fixed amount caused by lightning (event A). Presuma- 
bly this company has several categories of insured objects such as in- 
dustrial, urban, rural, ete. Studying separately the damages to indus- 
trial objects means to study the event A only in conjunction with the 
event H—Damage is to an industrial object.” Formula (1.2) again 
applies in an obvious manner. Note, however, that for an insurance 
company specializing in industrial objects the category H coincides 
with the whole sample space, and P{A|H} reduces to P{A}. 

Finally consider the bridge player North. Once the cards are dealt, 
he knows his hand and is interested only in the distribution of the re- 
maining 39 cards. It is legitimate to introduce the aggregate of all 
possible distributions of these 39 cards as a new sample space, but it 
is obviously more convenient to consider them in conjunction with the 
13 cards in North’s hand (event H) and to speak of the probability of 
an event A (say South’s having two aces) assuming the event H. For- 


mula (1.2) again applies. 

By analogy with (1.2) we now introduce the formal 

Definition. , Let H be an event with positive probability. For an arbi- 
trary event A we shall write 


(1.3) P{A|H} = 


The quantity so defined will be called the conditional probability of A on 
the hypothesis H (or for given H). When all sample points have equal 
the ratio Naz/Nu of the number of sample 


probabilities, P{4 |H} is he nu 
points common to A and H, to the number of points in H. 


Conditional probabilities remain undefined when the hypothesis has 
zero probability. This is of no consequence in the case of discrete 
sample spaces but is important in the general theory. ak 

Though the symbol P{A |H} itself is practical, its phrasing in words 
is so unwieldy that in practice less formal descriptions are used. anus 
in our introductory example we referred to the probability of a female's 
being colorblind instead of saying “‘the conditional probability of a ran- 
domly chosen person’s being colorblind on the hypothesis ase the per- 
son is a female.” Often the phrase “‘on the hypothesis H” is replaced 
by “if it is known that H occurred.” In short, our formulas and sym- 
bols are unequivocal, but phrasings in words are often informal and 


must be properly interpreted. 
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Sometimes for stylistic clarity probabilities in sample space are calte 
absolute probabilities.in contradistinction to conditional ones. Strictly 
speaking, the adjective “absolute” is redundant and will be omitted. 

Taking conditional probabilities of various events with respect to a 


also for conditional probabilities with respect to any particular hypothesis 
H. As an example we mention the fundamental relation for the proba- 
bility of the occurrence of either A or B or both. We have 


(14) P{A U BIH} = P{A|H} + P{B|H) — P{AB|H). 


Similarly, all theorems of chapter IV concerning probabilities of the 


realization of m among N events carry over to conditional probabilities, 
but we shall not need them. 


Formula (1.3) is often used in the form 
(1.5) P{AH} = P{A|H}-P{H}. 


This is the so-called theorem on compound probabilities. To generalize 


it to three events A, B, C we first take H = BC as hypothesis and then 
apply (1.5) once more; it follows that 


(1.6) P{ABC} = P{A|BC}-P{B|C} -P{C}. 


A further generalization to four or more events is straightforward. 

We conelude with a simple formula which is frequently useful. Let 
Hı, ..., Hn be a set of mutually exclusiy 
sarily occurs (that is, the union of Hi; s He 
space). 'Fhen any event A ean occur only in 
H;, or in symbols, 


(1.7) A = AH, U AH, U...U AH,. 


Since the AH; are mutually exclusive, their probabilities add. Apply- 
ing (1.5) to H = H; and adding, we get 


(1.8) P(A} = 2P{A|H;}-P{H,). 


This formula is useful because an evaluation oi 


f the conditional prob- 
abilities P{A | H;} is sometimes easier than a direct calculation of P{A}. 


Examples. (a) Sampling without replacement, 
of the n elements 1,2, ... nan ordered sample is 
be two different elements. Assuming that 7 is the 
(event H), what is the probability that the second 


is the entire sample 
conjunction with some 


From a population 
taken. Leti and j 
first element drawn 
element is j (event 
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A)? Clearly P{AH} = 1/n(n — 1) and P{A|H} = 1/(n — 1). This 
expresses the fact that the second choice refers to a population of n — 1 
elements, each of which has the same probability of being chosen. In 
fact, the most natural definition of random sampling is: “Whatever the 
first r choices, at the (r+1)st step each of the remaining n — r elements 
has probability 1/(n — r) to be chosen.” This definition is equivalent 
to that given in chapter II, but we could not have ‘stated it earlier 
since it involves the notion of conditional probability. 

(b) Four balls are placed successively into four cells, all 4* arrange- 
ments being equally probable. Given that the first two balls are in 
different cells (event H), what is the probability that one cell contains 
exactly three balls (event A)? Given H, the event A can occur in two 
ways, andso P{A|H} = 2-47? = $. (It is easy to verify directly that 
the events H and AH contain 12-4? and 12-2 points, respectively.) 

(c) Distribution of sexes. Consider families with exactly two chil- 
dren. Letting b and g stand for boy and girl, respectively, and the 
first letter for the older child, we have four possibilities: bb, bg, gb, gg. 
These are the four sample points, and we associate probability ł with 
each. Given that a family has a boy (event H), what is the probability 
that both children are boys (event A)? The event AH means bb, and 
H means bb, or bg, or gb. Therefore, P{A |H} = 4; in about one-third 
of the families with the characteristic H we can expect that A also will 
occur. It is interesting that most people expect the answer to be 4. 
This is the correct answer to a different question, namely: A boy is 
chosen at random and found to come from a family with two children; 
what is the probability that the other child is a boy? The difference 
may be explained empirically. With our original problem we might 
refer to a card file of families, with the second to a file of males. In 
the latter, each family with two boys will be represented twice, and 
this explains the difference between the two results. 

(d) Stratified populations. Suppose a human population consists of 
subpopulations or strata H1, He, .--- These may be races, age groups, 
professions, etc. Let p; be the probability that an individual chosen 
at random belongs to H;. Saying “the probability that an individual 
in H; is left-handed is g,” is short for “the conditional probability of 
the event A (left-handedness) on the hypothesis that an individual be- 
longs to H; is qj.” The probability that an individual chosen at ran- 
dom is left-handed is pıgı + P292 + P343 +--+, which is a special case 
of (1.8). Given that an individual is left-handed, the conditional prob- 


ability of his belonging to stratum H; is 


Pili 
1.9 P{H,|A} =- 
CD (Hil) pig + Pog +--- 
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2. PROBABILITIES DEFINED BY CONDITIONAL 
PROBABILITIES. URN MODELS 


In the preceding section we have taken the probabilities in the sample 
space for granted and merely calculated a few conditional probabilities, 
In applications, many experiments are described by specifying certain 
conditional probabilities (although the adjective “conditional” is usu- 
ally omitted). Theoretically this means that the probabilities in sample 


probabilities p; of the several strata, and the conditional probability 
qj of the characteristic “left-handed” within each stratum. A few more 


examples will reveal the general scheme more effectively than a direct 
description could, 


Examples. (a) In example I(5.b) we have considered three pluyers 
a, b, c taking turns at a game; we have described the points of the 
sample space but have not assigned probabilities to them. Suppose 
now that the game is such that at each trial each of the two partners 
has probability = of winning. This statement does not contain the 
word “conditional probability” but refers to it nonetheless. For it says 
that if player a Participates in the rth round (event ), his probability 
of winning that particular round is 3. It follows from equation (1.5) 
that the probability of a winning at the first and second try is 4, in 
symbols, P{aa} = rN repeated application of (1.5) shows that 
Pfacc} = 4, P{facbb} = Ta, ete.; that is, a sample point of the scheme 
(+) involving r letters has probability 27”, This is the assignment, of 
probabilities used in problem 1,5, but now the description is more 
intuitive. (Continued in problem 14.) 

(b) Families. We want to interpret the following statement, “The 
probability of a family with exactly k children i 


+...= 1). For any family size all sex distributions have equal 


consists of the points 0 (no children), b, g, bb, bg, gb, 99, bbb, .... The 
second assumption in quotation marks can be stated more fo 


2” possible sex distributions has conditional probability 2. The 
probability of the hypothesis is Pn, and we see from (1.5) that the 
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absolute probability of any arrangement of n letters b and g is p,-2*. 

Note that this is an example of a stratified population, the families 
of size j forming the stratum H;. As an exercise let A stand for the 
event “‘the family has boys but no girls.” Its probability is obviously 
P{A} = pi-2™ + p2:2? +... which is a special case of (1.8). The 
hypothesis H; in this case is “family has j children.” We now ask the 
question: If it is known that a family has no girls, what is the (condi- 
tional) probability that it has only one child? Here A is the hypothesis. 
Let H be the event “‘only one child.” Then AH means “one child and 
no girl,” and 

P{AH} p27 
Gl) PHA = Sea ep ae ae 
P{A} pı2 1 4 p2? + p32 +... 

which is a special case of (1.9). 

(c) Urn models for aftereffect. For the sake of definiteness consider 
an industrial plant liable to accidents. The occurrence of an accident 
might be pictured as the result of a superhuman game of chance: Fate 
has in storage an urn containing red and black balls; at regular time 
intervals a ball is drawn at random, a red ball signifying an accident. 
If the chance of an accident remains constant in time, the composition 
of the urn is always the same. But it is conceivable that each accident 
has an aftereffect in that it either increases or decreases the chance of 
new accidents. This corresponds to an urn whose composition changes 
according to certain rules that depend on the outcome of the successive 
drawings. It is easy to invent a variety of such rules to cover various 
situations, but we shall be content with a discussion of the following! 


Urn model: An urn contains b black and r red balls. A ball is drawn 
at random. It is replaced and, moreover, c balls of the color drawn and d 
balls of the opposite color are added. A new random drawing is made 
from the urn (now containing r + b + c + d balls), and this procedure 
is repeated. Here c and d are arbitrary integers. They may be chosen 
negative, except that in this case the procedure may terminate after 
finitely many drawings for lack of balls. In particular, choosing c = — 1 
and d = 0 we have the model ot random drawings without replacement 


which terminates after r + b steps. 


1'The idea to use urn models to describe aftereffects (contagious diseases) seems 
to be due to Polya. His scheme (first introduced in F. Eggenberger and G. Polya, 
Uber die Statistik verketteter Vorgänge, Zeitschrift für Angewandte Mathematik and 
Mechanik, vol. 3 (1923), pp- 279-289) served as a prototype for many models dis- 
cussed in the literature. The model described in the text and its three special 
cases were proposed by B. Friedman, A simple urn model, Communications on Pure 


and Applied Mathematics, vol. 2 (1949), pp. 59-70. 
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To turn our picturesque description into mathematics, note that it 
specifies conditional probabilities from which certain basic probabilities 
are to be calculated. A typical point of the sample space corresponding 
to n drawings may be represented by a sequence of n letters B and R. 
The event “black at first drawing” (i.e., the aggregate of all sequences 
starting with B) has probability b/(b + 7). If the first ball is black, 
the (conditional) probability of a black ball at the second drawing is 
(b+ c)/(b +r +c+d). The (absolute) probability of the sequence 
black, black (i.e., the aggregate of the sample points starting with BB) 
is therefore, by (1.5), 


b b+c 
b+r b+r+c+d 


(2.2) 


The probability of the sequence black, black, black is (2.2) multiplied 
by (b + 2c)/(b +r + 2c + 2d), ete. It is clear that in this way the 
probabilities of all sample points can be calculated. (Of course, in the 
case of a negative c or d the number n of drawings should be chosen 
small enough to avoid negative numbers of balls.) It is easily verified 
by induction that the probabilities of all sample points indeed add to 
unity. 

Explicit expressions for the probabilities are not readily obtainable 
except in the most important and best-known special case, that of 

Polya’s urn scheme which is characterized by d = 0,c>0. Here 
after each drawing the number of balls of the color drawn increases, 
whereas the balls of opposite color remain unchanged in number, In 
effect the drawing of either color increases the probability of the same 
color at the next drawing, and we have a rough model of phenomena 
such as contagious diseases, where each occurrence in 


+ m2 = n) has the 
same probability as the event of extracting first nı black and then ny 
red balls, namely, 


(2.3) Pain = 


_ bO + 6) + 2c) 1 O + me ~ o)-r(r +e) +++ 7 + me — 0) 
OHNO HEE N+ r4+2)---b+rt+ne—e 


On dividing numerator and denominator by c and using the nota- 
tion II(2.1), this formula may be rewritten in the following ways: 
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Grami) Grani CN 
Crte o 


(The Polya scheme is discussed in problems 18-24.) 

In addition to the Polya scheme our urn model contains another 
special case of interest, namely the 

Ehrenfest model? of heat exchange between two isolated bodies. In 
the original description, as used by physicists, the Ehrenfest model 
envisages two containers I and II and k particles distributed in them. 
A particle is chosen at random and moved from its container into the 
other container. This procedure is repeated. What is the distribution 
of the particles after n steps? To reduce this to an urn model it suffices 
to call the particles in container I red, the others black. Then at each 
drawing the ball drawn is replaced by a ball of the opposite color, 
that is, we have c = —1,d = 1. It is clear that in this case the proc- 
ess can continue as long as we please (if there are no red balls, a black 
ball is drawn automatically and replaced by a red one). [We shall dis- 
cuss the Ehrenfest model in another way in example XV(2,f).] 

The special case c = 0, d > 0 has been proposed by Friedman as a 
model of a safety campaign. Every time an accident occurs (i.e., a red 
ball is drawn), the safety campaign is pushed harder; whenever no acci- 
dent occurs, the campaign slackens and the probability of an accident 
increases, 

(d) Urn models for stratification. Spurious contagion. To continue 
in the vein of the preceding example, suppose that each person is liable 
to accidents and that their occurrence is determined by random draw- 
ings from an urn. This time, however, we shall suppose that no after- 
effect exists, so that the composition of the urn remains unchanged 
throughout the process. Now the chance of an accident or proneness 
to accidents may vary from person to person or from profession to pro- 
fession, and we imagine that each person (or each profession) has his 
own urn. In order not to complicate matters unnecessarily, let us sup- 
pose that there are just two types of people (two professions) and that 
their numbers in the total population stand in the ratio 1:5. We con- 
sider then an urn I containing 7; red and b black balls, and an urn II 


(2.4) Pain = 


2P, and T. Ehrenfest, Uber zwei bekannte Einwände gegen das Boltzmannsche 
H-Theorem, Physikalische Zeitschrift, vol. 8 (1907), pp. 311-314. For a mathe- 
matical discussion see M. Kac, Random walk and the theory of Brownian motion, 
American Mathematical Monthly, vol. 54 (1947), pp. 369-391. 
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containing 72 red and bz black balls. The experiment “choose a person 
at random and observe how many accidents he has during n time units” 
has the following counterpart: A die is thrown; if ace appears, choose 
urn I, otherwise urn II. In each case n random drawings with replace- 
ment are selected from the urn. Our experiment describes the situation 
of an insurance company accepting a new subscriber. 

By using (1.8) it is seen that the probability of red at the first draw- 
ing is 
a ee 
6 btn 6 b+ T2 


and the probability of a sequence red, red 


ee) ts Go 
i ar bi +r 6 be + To, 


No mathematical problem is involved in our model, but it has an 
interesting feature which has caused great confusion in applications. 
Suppose our insurance company observes that a new subscriber has an 
accident during the first year, and is interested in the probability of a 
further accident during the second year. In other words, given that 
the first drawing resulted in red, we ask for the (conditional) proba- 
bility of a sequence red, red. This is clearly the ratio P{RR}/P{R} 
and is different from P{R}. For the sake of illustration suppose that 
rı/(bı +71) = 0.6 and ro/(bg + T2) = 0.06. The probability of red at 
any drawing is 0.15, but if the first drawing resulted in red, the chances 
that the next drawing also results in red are 0.42. Note that our model 
involves no aftereffect in the total population, and yet the occurrence 
of an accident for a person chosen at random increases the odds that 
this same person will have a second accident. We have here an effect 
of sampling; the occurrence of an accident does not have a real effect, 
but it is an indication that the person chosen at random has a high 
proneness to accidents. 

In the statistical literature it has become customary to use the word 
contagion instead of aftereffect. The apparent aftereffect of sampling 
was at first misinterpreted as an effect of true contagion, and so statis- 
ticians now speak of contagion (or contagious probability distributions) 
in a vague and misleading manner. Take, for example, the ecologist 
searching for insects in a field. If after an unsuccessful period he finds 
an insect, he might conclude that the litter is likely close by and that his 
chances of finding another insect are good. Obviously no aftereffect is 
involved, and yet the statistician speaks of contagion. 


(2.5) P{R} = 
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(e) The following example is famous and illustrative but somewhat 
artificial. Imagine a population of N + 1 urns, each containing N red 
and white balls; the urn number k contains k red and N — k white 
balls (k = 0, 1, 2,..., N). An um is chosen at random and n random 
drawings are made from it, the ball drawn being replaced each time. 
Suppose that all n balls turn out to be red (event A). We seek the (con- 
ditional) probability that the next drawing will also yield a red ball 
(event B). If the first choice falls on urn number k, then the proba- 
bility of extracting in succession n red balls is (k/N)". Hence, by (1.8), 


174 27 +...+N” 
(2.7) P{A} = — a | 
The event AB meane that n + 1 drawings yield red balls, and therefore 
1PH + H +... NH 
Nt (N +1) p 


The required probability is P{B| A} = P{B}/P{A}. 
The sums in (2.7) and (2.8) can be considered Riemann sums approx- 
imating integrals, so that when N is large 


(2.8) P{AB} = P{B} = 


N k\? 1 1 
(2.9) No (=) ~] "dr =—- 
2 N. 0 n+l 
We have therefore for large N approximately 
nti 
2.10 P{B|A} = : 
(2.10) (BIA) ~ 


This formula can be interpreted roughly as follows: If all compositions 
of an urn are equally probable, and if n trials yielded red balls, the 
probability of a red ball at the next trial is (n + 1)/(n + 2). This is 
the so-called law of succession of Laplace (1812). 

Before the ascendance of the modern theory, the notion of equal 
probabilities was often used as synonymous for “no advance knowl- 
edge.” Laplace himself has illustrated the use of (2.10) by computing 
the probability that the sun will rise tomorrow, given that it has risen 
daily for 5000 years or n = 1,826,213 days. It is said that Laplace was 
ready to bet 1,826,214 to 1 in favor of regular habits of the sun, and 
we should be in a position to better the odds since regular service has 
followed for another century. A historical study would be necessary 
to render justice to Laplace and to understand his intentions. His 
successors, however, used similar arguments in routine work and rec- 
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ommended methods of this kind to physicists and engineers in cases 
where the formulas have no operational meaning. We should have to 
reject the method even if, for sake of argument, we were to concede 
that our universe was chosen at random from a collection in which all 
conceivable possibilities were equally likely. In fact, it pretends to 
judge the chances of the sun’s rising tomorrow from the assumed risings 
in the past. But the assumed rising of the sun on February 5, 3123 
B.C., is by no means more certain than that the sun will rise tomorrow. 
We believe in both for the same reasons. 


Note on Bayes’s Rule. In (1.9) and (2.2) we have calculated certain condi- 
tional probabilities directly from the definition. The beginner is advised always to 
do so and not to memorize the formula (2.12), which we shall now derive. It re- 
traces in a general way what we did in special cases, but it is only a way of rewriting 
(1.3). We had a collection of events Hı, He, ... which are mutually exclusive and 
exhaustive, that is, every sample point belongs to one, and only one, among the 
H;. We were interested in 
P{AH;} 


(2.11) P({H,|A} = Pid] 


If (1.5) and (1.8) are introduced into (2.11), it takes the form 


(2.12) P(A) = PHAP I) 

DP (ALA P (HS) 
1 
If the events H; are called causes, then (2.12) becomes “Bayes’s rule for the proba- 
bility of causes.” Mathematically, (2.12) is a special way of writing (1.3) and 
nothing more. The formula is useful in many statistical applications of the type 
described in examples (b) and (d), and we have used it there. Unfortunatély, 
Bayes's rule has been somewhat discredited by metaphysical applications of the 
type described in example (e). In routine practice this kind of argument can be 
dangerous. A quality control engineer is concerned with one particular machine 
and not with an infinite population of machines from which one was chosen at 
random. He has been advised to use Bayes’s rule on the grounds that it is logically 
acceptable and corresponds to our way of thinking. Plato used this type of argu- 
ment to prove the existence of Atlantis, and philosophers used it to prove the 


r by estimating 
and minimizing the sources of various types of errors in prediction and guessing. 
The modern method of Statistical tests and estimation is less intuitive but more 
realistic. It may be not only defended but also applied. 


3. STOCHASTIC INDEPEN DENCE 


In the examples above the conditional probability P{A |H} generally 
does not equal the absolute probability P{ A}. Popularly speaking, 
the information whether H has occurred changes our way of betting 
on the event A. Only when P{A|H} = P{A} this information does 


$ E 
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not permit any inference about the occurrence of A. In this case we 
shall say that A is stochastically independent of H. Now (1.5) shows 
that the condition P{A|H} = P{A} can be written in the form 


(3.1) P{AH} = P{A}-P{H}. 


This equation is symmetric in A and H and shows that whenever A is 
stochastically independent of H, so is H of A. It is therefore preferable 
to start from the following symmetric 


Definition 1. Two events A and H are said to be stochastically inde- 
pendent (or independent, for short) if equation (3.1) holds. This defini- 
tion is accepted also if P{H} = 0, in which case P{A |H} is not defined, 
The term statistically independent is synonymous with stochastically 
independent. 


Examples. (a) A card is chosen at random from a deck of playing 
cards. For reasons of symmetry we expect the events “spade” and 
“ace” to be independent. As a matter of fact, their probabilities are 
and yy, and the probability of their simultaneous realization is gy. 

(b) Two true dice are thrown. The events “ace with first die” and 
“even face with second” are independent since the probability of their 
simultaneous realization, 35 = Hy, is the product of their probabilities, 
namely 4 and $. 

(c) In a random permutation of the four letters (a, b, c, d) the events 
“a precedes b” and “c precedes d” are independent. This is intuitively 
clear and easily verified. 

(d) Sex distribution. We return to example (1.c) but now consider 
families with three children. We assume that each of the eight possi- 
bilities bbb, bbg, ..., ggg has probability $. Let H be the event “the 
family has children of both sexes,” and A the event “there is at most 
one girl.” Then P{H} = $, and P{A} = $. The simultaneous reali- 
zation of A and H means one of the possibilities bbg, bgb, gbb, and 
therefore P{AH} = # = P{A}-P{H}. Thus in families with three 
children the two events are independent. Note that this is not true 
for families with two or four children. This shows that it is not always 
obvious whether or not we have independence. 


If H occurs, the complementary event H’ does not occur, and vice 
versa, Stochastic independence implies that no inference can be drawn 
from the occurrence of H to that of A; therefore stochastic independ- 
ence of A and H should mean the same as independence of A and H’ 
(and, because of symmetry, also of A’ and H, and of A’ and H’). This 
assertion is easily verified, using the relation P{H’} = 1 — P{H}. If 
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(3.1) holds, then (since AH’ = A — AH) 
(3.2) P{AH’} = P{A} —P{AH} = P{A} — P{A}-P{H} = 


= P{A}-P{H’}, 
as expected. ae 
Suppose now that three events A, B, and C are pairwise independent 
so that 
P{AB} = P{A}-P{B} 


(3.3) P{AC} = P{A}-P{C} 
P{BC} = P{B}-P{c}. 


We might think that this always implies the independence of such 
pairs of events as AB and C. Unfortunately this is not necessarily so. 
We shall exhibit an example in which (3.8) is true but the simultaneous 
occurrence of A, B, and G is impossible, so that AB and C cannot be 
independent. 


Example. (e) Two dice are thrown and three events are defined as 
follows: A means “odd face with first die”; B means “odd face with 
second die”; finally, C means “odd sum” (one face even, the other odd). 
If each of the 36 sample points has probability zs, then any two of 
the events are clearly independent. The probability of each is $, and 
so is its conditional probability, assuming that one of the other two 
events has occurred. Nevertheless, the three events cannot occur si- 
multaneously. The information that A but not B has occurred assures 


that C has occurred, and a similar statement holds for all other com- 
binations, 


It is desirable to reserve the term stochastic independence for the 
cese where no such inference is possible. Then not only (3.3) must 
hold but in addition 
(3.4) P{ABC} = P{A}P{B}P{C}. 


This equation insures that A and BC are independent and also that 
the same is true of B and AC, and C and AB. Furthermore, it can 
now be proved also that A U B and C are independent. In fact, by 
the fundamental relation 1(7.4) we have 


(3:5) PI(A U B)C} = P{AC} + P{BC} — P{ARC}. 


Now, applying (3.3) and (3.4) to the right side, we can factor out P{C} 
The other factor is P{A} + P{B} — P{AB} = P{A U B} so that 


(3.6) P{A U B)C} = P{(A U B)} P{C}. 
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This makes it plausible that the conditions (3.3) and (3.4) together 
suffice to avoid embarrassment; any event expressible in terms of A 
and B will be independent of C. 

In the general case of n events the following definition proves satis- 
factory. 

Definition 2. The events Ay, Ao, ..., An are called mutually inde- 
pendent if for all combinations 1 < i < j < k <... < n the multiplica- 
tion rules 

P{4;4;} = P{A,} P{A;} 
P{A;A;Az} = P{A,} P{A;} P{A;} 


P{4142 ... An} = P{Ai} P{Ao} --- P{An} 
apply. 


n 
The first line stands for # equations, the second for (5) „etc. We 


have, therefore, 


()+()++()-aem= (eran 


conditions which must be satisfied. On the other hand, the () con- 


ditions stated in the first line suffice to insure pairwise independence. 
The whole system (3.7) looks like a complicated set of conditions, but 
it will soon become apparent that its validity is usually obvious and 
requires no checking. It is readily seen by induction {starting with 
n = 2 and (8.2)] that 


In definition 2 the system (3.7) may be replaced by the system of the 2” 
equations obtained from the last equation in (3.7) on replacing an arbi- 
trary number of events A; by their complements A';. 

The distinction between mutual and pairwise independence is of theo- 
retical rather than practical interest. Practical examples of pairwise 
independent events that are not mutually independent apparently do 
not exist. The possibility of such an occurrence was discovered by S. 
Bernstein. 
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4, REPEATED TRIALS 


The notion of stochastic independence finally enables us to formulate 
analytically the intuitive concept of experiments “repeated under iden- 
tical conditions.” 

Consider the sample space © representing a certain conceptual ex- 
periment. Let the sample points be Z,, E2, ... and denote their 
probabilities by pı, P2, .... The possible results of a succession of two 
similar experiments are the pairs (Z;, Ex), and they form a new sample 
space. In it probabilities can be assigned in many ways. However, if 
the experimentalist says that two measurements are performed under 
identical conditions, he implies independence; the first outcome should 
have no influence on the second. This means that the two events “first 
outcome is Z;” and “second outcome is Eg” should be stochastically 
independent or that 


(4.1) P{E;, Ex} = pypr. 


This equation assigns a probability to every pair (Ej, Ex). Before we 
can use (4.1) as a definition of probabilities in the new sample space, 
we must show that the quantities pp, add to unity. Now, in the sum 
2Zp;p; each term appears once, and only once, so that 22 pjp, = (pı + 
+ p2 +...)(pı + p2 +...) = 1. Hence (4.1) is acceptable as a defi- 
nition of probabilities. 

Let A and B be two arbitrary events in the original sample space ©, 
We denote the event “A occurred at first trial and B at second” by 
(A, B). Suppose A contains the points Ea Eq, ... and B the points 
Eb, Eo .... Then (A, B) is the union of all pairs (Eq,, Zy,), and as 
before we see that 


(4.2)  P{(A, B)} = 22 paps, = (Zpa,)(Ep»,) = P{A}P{B }. 
Hence the events A and B are independent. We see that the definition 


(4.1) entails that all events at the second trial be independent of events 


at the first trial. For the purposes of probability theory this describes 
“identical experiments.” 


These considerations obviously also apply to a succession of r experi- 
ments and lead to the 


Definition 1. Let © be a sample space with sample points E,, Eo, ... 
and corresponding probabilities pı, po, .... By r independent trials corre- 
sponding to © we mean the sample space whose points are the r-tuples 
(E;,, E;,, ..., Ej,) to which the probabilities 
(4.3) Pl i, Eis ++, Bi,)} = DiDig +++ Dip 


are assigned. 
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In other words, each point of the new space is a sample of size r 
(with possible repetitions) of points of the original space, and prob- 
abilities are defined by the multiplication rule (4.3). The reader is 
reminded that (4.3) is not the only possible definition of probabilities. 
In other words, repeated trials are not necessarily independent. For 
example, the Polya urn scheme [example (2.c)] defines dependent trials. 
Equation (4.3) defines independent trials or, in physical terms, trials 
repeated under identical conditions. 

The argument which led to (4.2) shows more generally the truth of 
the following theorem concerning independent trials. 


Theorem. Suppose that a system of events Ay, Ag, ..., Ay is such 
that the jth trial alone decides whether or not A j occurs; then the events 
Aj, ..., A, are mutually independent if the trials are independent, thal 
is, if (4.3) holds. 


If S contains a finite number, N, of points, then there are N” sample 
points (E; ..., H;,). If each point of © has probability 1/N, then 
(4.3) assigns probability N~" to each point (Ej ..., Ej). The new 
approach is conceptually preferable to a formal assignment of equal 
probabilities because it applies to sample spaces with unequal prob- 
abilities and also to infinite sample spaces. It is indispensable for the 
general theory of probability where we consider even a single trial as 
the first in a potentially infinite séquence. We are then dealing only 
with infinite sequences (E;,, Ej, ...) of possible outcomes, and in this 
new space probabilities are defined in a way consistent with (4.8). 
Unfortunately this leads beyond the theory of discrete sample spaces, 
to which the present volume is restricted. We have a more elementary 
theory but pay for it by the necessity of changing the sample space 
according to the number of trials. 

In the preceding discussion we have considered only repetitions of 
the same experiment, but successions of unlike experiments can be 
treated in the same way. If we first toss a coin, then throw a die, we 
naturally assume that the two experiments are independent. This 
amounts to assigning probabilities by the product rule. Thus 
P{ (heads, ace)} = 3-4, ete. In this particular case this is equivalent 
to assigning equal probabilities to all twelve sample points, but in 
general we must proceed as in (4.3). 


Definition 2. Let S' and ©” be iwo sample spaces and denote their 
points by E';, E's, ... and E”1, E”2, .... Let the corresponding proba- 
bilities be p's, p's, ... and p's, p's, .... The succession of the two 
experiments is described by the space with points (E';, E"’,). Saying that 
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the two successive experiments are independent means defining probabilities 
by 


(Cu P{(E';, B'x)} = v's". 


[The notions which were just introduced are by no means peculiar to probability 
theory. Given two spaces S’ and ©” with generic points E’ and E”, the set of all 
pairs (E’, E”) is called the combinatorial product of S’ and ©” and is usually denoted 
by S’ X ©”. For example, the Cartesian plane, that is the set of pairs (z, y), is 
the combinatorial product of the z-axis and the y-axis. (The three dimensional 
space may be viewed either as triple product or as product of the z, y-plane and 
the z-axis.) The equation (4.4) defines what is usually called the product measure 
of the probabilities in SG’ and S”. We used the word experiment as equivalent to 
a sample space with a probability defined in it. Similarly, succession of two inde- 
pendent experiments is short for combinatorial product of the corresponding sample 
spaces with probabilities defined by (4.4). 

These notions carry over in an obvious way to products of any number of spaces. 
For example, in (4.3) there figures the r-tuple combinatorial product of © with 
itself. Where the student of probability speaks of the first, second, ..., trial, other 
mathematicians use the term: first, second, ..., coordinate space. (An event which 


depends only on the outcome of the first trial is also called a cylindrical set over 
the first coordinate space.)} 


The aggregate of all pairs (i, J) where i, j are positive integers between 1 and n 
forms the product of the set of integers 1, 2, ..., n with itself. In sampling without 
replacement pairs of the form (i, i) are forbidden, and therefore taking a sample of 
size two without replacement does not directly lead to a product space. Neverthe- 
lesa, as the following examples will show, it is possible to represent it in a different 
way as a succession of independent experiments, and the same method applies to 


more complicated cases. 

Examples. (a) Permutations. We have considered the n! permuta- 
tions of a, a2, ..., an as points of a sample space and attributed prob- 
ability 1/n! to each. We may consider the same sample space as repre- 
senting n — 1 successive experiments as follows. Begin by writing 
down aı. The first experiment consists in putting az either before or 
after a,. This done, we have three places for a3 and the second experi- 
ment consists of a choice among them, deciding on the relative order 
of a;, @2, and az. In general, when a, ..., a, are put into some rela- 
tive order, we proceed with experiment number k, which consists in 
selecting one of the k + 1 places for azı. In other words, we have a 
succession of n — 1 experiments of which the kth can result in k dif- 
ferent choices (sample points), each having probability 1/k. The ex- 
periments are independent, that is, the probabilities are multiplicative. 
Each permutation of the n elements has probability 4-4 --- 1/n, in 

accordance with the original definition. 
` (b) Sampling without replacement. Let the population be (Gi; tas 
an). In sampling without replacement each choice removes an element. 


I 
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After k steps there remain n — k elements, and the next choice can be 
described by specifying the number » of the place of the element chosen 
(v =1, 2, ..., n—k). In this way the taking of a sample of size 7 
without replacement becomes a succession of r experiments where the 
first has n possible results, the second n — 1, the third n — 2, ete. 
We attribute equal probabilities to all results of the individual experi- 
ments and postulate that the r experiments are independent. This 
amounts to attributing probability 1/(n), to each sample in accordance 
with our definition of random samples. (Note that forn = 100,7 = 3, 
the sample (a13, @40, agı) means choices number 13, 39, 79, respectively. 
We must say that at the third experiment the seventy-ninth element 
of the reduced population of n — 2 was chosen, for with the original 
numbering the outcomes of the third experiment would depend on the 
first two choices.) We see that the notion of repeated independent 
experiments permits us to study sampling as a succession of individual 
operations. 


*5, APPLICATIONS TO GENETICS 


The theory of heredity, originated by G. Mendel (1822-1884), pro- 
vides instructive illustrations for the applicability of simple probability 
models. We shall restrict ourselves to indications concerning the most 
elementary problems. In describing the biological background, we shall 
necessarily oversimplify and concentrate on such facts as are pertinent 
to the mathematical treatment. 

Heritable characters depend on special carriers, called genes. All 
cells of the body, except the reproductive cells or gametes, carry exact 
replicas of the same gene structure. The salient fact is that genes ap- 
pear in pairs. The reader may picture them as a vast collection of beads 
on short pieces of string, the chromosomes. These also appear in pairs, 
and paired genes occupy the same position on paired chromosomes. In 
the simplest case each gene of a particular pair can assume two forms 
(alleles), A and a. Then three different pairs can be formed, and, with 
respect to this particular pair, the organism belongs to one of the three 
genotypes AA, Aa, aa (there is no distinction between Aa and aA). For 
example, peas carry a pair of genes such that A causes red blossom 
color and a causes white. The three genotypes are in this case distin- 
guishable as red, pink, and white. Each pair of genes determines one 
heritable factor, but the majority of observable properties of organisms 
depend on several factors. For some characteristics (e.g., eye color and 
left-handedness) the influence of one particular pair of genes is pre- 


* This section treats a special subject and may be omitted. 


122 CONDITIONAL PROBABILITY [V.5 


dominant, and in such cases the effects of Mendelian laws are readily 
observable. Other characteristics, such as height, can be understood 
as the cumulative effect of a very large number of genes [cf. example 
X(5.c)]. Here we shall study genotypes and inheritance for only one 
particular pair of genes with respect to which we have the three geno- 
types AA, Aa, aa. Frequently there are N different forms A 1 ete ae 


d N +I; 
for the two genes and, accordingly, ( 5 ) genotypes A,A,, Aido, 


---, AwAy. The theory applies to this case with obvious modifica- 
tions (cf. problem 27). The following calculations apply also to the 
case where A is dominant and a recessive. By this is meant that Aa- 
individuals have the same observable properties as AA, so that only 
the pure aa-type shows an observable influence of the a-gene. All 
shades of partial dominance appear in nature. Typical partially reces- 
sive properties are blue eyes, left-handedness, ete. 

The reproductive cells, or gametes, are formed by a splitting process 
and receive one gene only. Organisms of the pure AA- and aa-geno- 
types (or homozygotes) produce therefore gametes of only one kind, 
but Aa-organisms (hybrids or heterozygotes) produce A- and a-gametes 
in equal numbers. New organisms are derived from two parental gam- 
etes from which they receive their genes. Therefore each pair includes 
a paternal and a maternal gene, and any gene can be traced back to 
one particular ancestor in any generation, however remote. 

The genotypes of offspring depend on a chance process. At every 
occasion, each parental gene has probability 4 to be transmitted, and 
the successive trials are independent. In other words, we conceive of 
the genotypes of n offspring as the result of n independent trials, each 
of which corresponds to the tossing of two coins. For example, the 
genotypes of descendants of an Aa X Aa pairing are AA, Aa, aa with 
respective probabilities ł, 4, 3. An AA X aa union can have only 
Aa-ofispring, etc. 

Looking at the population as a whole, we conceive of the pairing of 
parents as the result of a second chance Process. We shall investigate 
only the so-called random mating, which is defined by this condition: 
If r descendants in the first filial generation are chosen at random, then 
their parents form a random sample of size r, with possible repetitions, 
from the aggregate of all possible parental pairs. In other words, each 
descendant is to be regarded as the product of a random selection of 
parents, and all selections are mutually independent. Random mating 
is an idealized model of the conditions prevailing in many natural popu- 

lations and in field experiments, However, if red peas are sown in one 
corner of the field and white peas in another, parents of like color will 
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unite more often than under random mating. Preferential selectivity 
(such as blonde preferring blondes) also violates the condition of ran- 
dom mating. Extreme non-random mating is represented by self-fer- 
tilizing plants and artificial inbreeding. Some such assortative mating 
systems will be analyzed mathematically, but for the most part we shall 
restrict our attention to random mating. 

The genotype of an offspring is the result of four independent random 
choices. The genotypes of the two parents can be selected in 3-3 ways, 
their genes in 2-2 ways. However, we may combine two selections and 
describe the process as one of double selection thus: The paternal and 
maternal gene are each selected independently and at random from 
the population of all genes carried by males or females of the parental 
population. 

Suppose that the three genotypes AA, Aa, aa occur among males 
and females in the same ratios, u:2v:w. We shall suppose u + 2v + 
+w = 1 and call u, 2v, w, the genotype frequencies. Put 


(5.1) p=uty, q=v+uv. 


Clearly the numbers of A- and a-genes are as p:q, and since p + q = 1 
we shall call p and q the gene frequencies of A and a. In each of the 
two selections an A-gene is selected with probability p, and, because of 
the assumed independence, the probability of an offspring being AA 
is p’. The genotype Aa can occur in two ways, and its probability is 
therefore 2pg. Thus, under random mating conditions an offspring 
belongs to the genotypes AA, Aa, or aa with probabilities 


(5.2) w= p, 2 =2, w=¢’. 


Examples. (a) All parents are Aa (heterozygotes); then u = w = 0, 
2v = 1, and p = g = 4%. (b) AA- and aa-parents are mixed in equal 
proportions; then u = w = 3, v = 0, and again p =q = 3. (c) Fi- 
nally, u = w = 4, 2v = 3; again p =q = $. In all three cases we 
have for the filial generation u, = 3, 20, = 3, wi = #. 

For a better understanding of the implications of (5.2) let us fix the 
gene frequencies p and q (p + g = 1) and consider all systems of geno- 
type frequencies u, 2v, w for which u + v = p and v+ w =g. They 
all lead to the same probabilities (5.2) for the first filial generation. 
Among them there is the particular distribution 


(5.3) u=p, w=2, w=ğ. 


If the frequencies u, v, w in the original generation stand in the par- 
ticular relation (5.3)—as in example c—then we find for the genotype 
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probabilities in the first filial generation u; = u, vı = v, and w; = w. 
Therefore we call genotype distributions of the form (5.3) stationary. 
To every ratio p:q there corresponds a stationary distribution, or equi- 
librium. 

Equations (5.2) give the genotype probabilities for a randomly se- 
lected individual of the second generation. In a large population we 
must expect the actual genotype frequencies to be close to the theo- 
retical distribution.2 Now, whatever the distribution u:2v:w in the 
parental generation, equations (5.2) define a stationary distribution; 
in it the genes A and a appear with frequencies [cf. (5.1)] u1 + vı = u + 
+v=pandv,+w, =v+w=dg. In other words, if the observed 
frequencies coincided exactly with the calculated probabilities, then 
the first filial generation would have a stationary genotype distribution 
which would perpetuate itself without change in all succeeding genera- 
tions. In practice, deviations will be observed, but for large popula- 
tions we can say: Whatever the composition of the parent population may 
be, random mating will within one generation produce an approximately 
stationary genotype distribution with unchanged gene frequencies. From 
the second generation on, therc is no tendency toward a systematic 
change; a steady state is reached with the first filial generation. This 
was first noticed by G. H. Hardy, who thus resolved assumed diffi- 
culties in Mendelian laws. It follows in particular that under condi- 
tions of random mating the frequencies of the three genotypes must 
stand in the ratios p?:2pq:q’. This can in turn be used to check the 
assumption of random mating. 

Hardy also pointed out that emphasis must be put on the word 
“approximately.” Even with a stationary distribution we must expect 
small changes from generation to generation, which leads us to the fol- 
lowing picture. Starting from any parent population, random mating 
tends to establish the stationary distribution (5.3) within one genera- 
tion. For a stationary distribution there is no tendency toward a sys- 
tematic change of any kind. However, chance fluctuations will change 


* Without this our probability model would be void of operational meaning. The 
statement is made precise by the law of large numbers and the central limit the- 
orem, which permits us to estimate the effect of chance fluctuations. 

4G. H. Hardy, Mendelian proportions in a mixed Population, Letter to the 
Editor, Science, N.S., vol. 28 (1908), pp. 49-50. Anticipating. the language of 
chapters IX and XV, we can describe the situation as follows. The frequencies of 
the three genotypes in the nth generation are three random variables whose ex- 
pected values are given by (5.2) and do not depend on n. Their actual values will 


vary from generation to generation and form a stochastic process of the Markov 
type. 
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the gene frequencies p and q from generation to generation, and the 
genetic composition will slowly drift. There are no restoring forces 
seeking to re-establish original frequencies. On the contrary, our sim- 
plified model leads to the conclusion [cf. example XV(2.k)] that, for a 
population bounded in size, one gene should ultimately die out, so that 
the population would eventually belong to one of the pure types, AA 
or aa. In nature this does not necessarily occur because of the crea- 
tion of new genes by mutations, selections, and many other effects, 
which can be studied by more refined mathematical tools (Markov 
chains, diffusion theory). 

Hardy’s theorem is frequently interpreted to imply a strict stability 
for all times. It is a common fallacy to believe that the law of large 
numbers acts as a force endowed with memory seeking a return to the 
original state, and many wrong conclusions have been drawn from this : 
assumption. (The biological processes here considered are typical of 
the important class of Markov processes which will be studied in detail 
in chapter XV.) Note that Hardy’s law does not apply to the distri- 
bution of two pairs of genes (e.g., eye color and left-handedness) with 
the nine genotypes AABB, AABb, ...,aabb. There is still a tendency 
toward a stationary distribution, but equilibrium is not reached in the 
first generation (cf. problem 31). 


* 6, SEX-LINKED CHARACTERS 


In the introduction to the preceding section it was mentioned that 
genes lie on chromosomes. These appear in pairs and are transmitted 
as units, so that all genes on a chromosome stick together. Our scheme 
for the inheritance of genes therefore applies also to chromosomes as 
units. Sex is determined by two chromosomes; females are XX, males 
XY. The mother necessarily transmits an X-chromosome, and the sex 
of offspring depends on the chromosome transmitted by the father. 
Accordingly, male and female gametes are produced in equal numbers. 
The difference in birth rate for boys and girls is explained by variations 
in prenatal survival chances. 

It has been said that both genes and chromosomes appear in pairs. 
There is an exception inasmuch as the genes situated on the X-chromo- 
some have no corresponding gene on Y. Females have two X-chromo- 
somes, and hence two of such X-linked genes; however, in males the 
X-genes appear as singles. Typical are two sex-linked genes causing 


* This section treats a special topic and may be omitted. 
ë This picture is somewhat complicated by occasional breakings and recombina- 


tions of chromosomes [cf., problem I1(10.12)]. 


126 CONDITIONAL PROBABILITY [v.6 


colorblindness and haemophilia. With respect to each of them, females 
can still be classified into the three genotypes, AA, Aa, aa, but, having 
only one gene, males have only the two genotypes A anda. Note that 
a son always has the father’s Y-chromosome so that a sex-linked char- 
acter cannot be inherited from father to son. However, it can pass 
from father to daughter and from her to a grandson. 

We now proceed to generalize the analysis of the preceding section. 
Assume again random mating and let the frequencies of the genotypes 
AA, Aa, aa in the female population be u, 2v, w, respectively. As 
before put p = u-+v,q=v-+w. The frequencies of the two male 
genotypes A and a will be denoted by p’ and q’ (p' + q = 1). Then 
p and p’ are the frequencies of the A-gene in the female and male 
populations, respectively. The probability for a female descendant to 
be of genotype AA, Aa, aa will be denoted by u1, 2v,, w;; the analogous 
probabilities for the male types A and a are p’;, qi. Now a male off- 
spring receives his X-chromosome from the female parent, and hence 


(6.1) Pi=?P, V1 =@. 

For the three female genotypes we find, as in section 5, 
(6.2) w= pp’, 2m =p] +g, wi = ay. 
Hence 


63) pı =u +u =30+p), a =n +w =} +g). 


We can interpret these formulas as follows. Among the male de- 
scendants the genes A and a appear approximately with the frequencies 
p, q of the maternal population; the gene frequencies among female 
descendants are approximately pı and qi, or halfway between those 
of the paternal and maternal populations. We discern a tendency 
toward equalization of the gene frequencies. In fact, from (6.1) and 
(6.3) we get 


64) pi-m=20-P), dı- = 4q- q). 

This means that random mating will in one generation reduce approxi- 
mately by one-half the differences between gene frequencies among 
males and females. However, it will not eliminate the differences, and 
a tendency toward further reduction will subsist. In contrast to 
Hardy’s law, we have here no stationary situation after one generation. 
We can pursue the systematic component of the changes from genera- 


tion to generation by neglecting chance fluctuations and identifying 
the theoretical probabilities (6.2) and (6.3) with corresponding actual 
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frequencies in the first filial generation.’ For the second generation we 
obtain by the same process 
(6.5) p2= $p: +P) = $p + ip, a= Flats) = gatie 


and, of course, p’2 = P1, g'2 = qı. A few more trials will lead to the 
general expression for the probabilities p, and g, among females of the 
nth descendant generation. Put 


(6.6) @=32p+p'), B= 32+). 
(Note that a + £ = 1.) Then 


Paa + D'n—1 Pp 
aeo S (a A 
(6.7) 
Qn—1 + Q'n—1 q= 
Ae —1)" , 
In = 6+ (-1) 32" 


and p’n = Pri; U'n = ga- Hence 
(6.8) Pn — q, Pa > a, In > P, dn > B. 


The genotype frequencies in the female population, as given by (6.2), 
are 


(6.9) Un = Pn—1P'n—1, 20m = Parana + Qn—1P'n—1; 


Wn = Qn-19'n—1- 
Hence 


(6.10) Un > a, dn > 208, Wn >B. 


These formulas show that there is a strong systematic tendency, 
from generation to generation, toward a state where the genotypes A 
and a appear among males with frequencies a and £, and the female 
genotypes AA, Aa, aa have probabilities a”, 2e8, 6?, respectively. The 
convergence is very fast, as indicated by (6.7). In practice, equilib- 
rium will be reached after three or four generations. To be sure, small 
chance fluctuations will be superimposed on the deseribed changes, but 
the latter represent the prevailing systematic tendency. 

Our main conclusion is that under random mating we can expect the 
sex-linked genotypes A and a among males, and AA, Aa, aa among 


‘In the terminology introduced in footnote 4 we can interpret pa and gn as the 
expected values of the gene frequencies in the nth female generation. With this 
interpretation the formulas for pn and qn are no longer approximations but exact. 
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females to occur approximately with the frequencies a, , a”, 2a8, 6°, 
respectively, where æ + £ = 1. 


Application. Many sex-linked genes, like colorblindness, are reces- 
sive and cause defects. Let a be such a gene. Then all a-males and 
all aa-females show the defect. Females of Aa-type may transmit the 
defect to their offspring but are not themselves affected. Hence we 
expect that a recessive sex-linked defect which ocas among males with 
frequency a occurs among females with frequency a?. If one man in 100 
is colorblind, one woman in 10,000 should be affected. 


*7, SELECTION 


As a typical example of the influence of selection we shall investigate 
the case where aa-individuals cannot multiply. This happens when 
the a-gene is recessive and lethal, so that aa-individuals are born but 
cannot survive. Another vase occurs when artificial interference by 
breeding or laws prohibits mating of aa-individuals. 

Assume random mating among AA- and Aa-individuals but no mat- 
ing of aa-types. Let the frequencies with which the genotypes AA, 
Aa, aa appear in the total population be u, 2v, w. The corresponding 
frequencies for parents are then 


u 


2v 
(7.1) út = , 20* = ——_, w* = 0. 
Lisp: l-uv 
We can proceed as in section 5, but we must use the quantities (7.1) 
instead of u, 2v, w. Hence, (5.1) is to be replaced by 


ane 1 
l—-w a ae 


(7.2) p= 


The probabilities of the three genotypes in the first filial generation 
are again given by (5.2) or uw = p°, 2, = 2pq, wı = @ 

As before, in order to investigate the systematic changes from genera- 
tion to generation, we have to replace u, v, w by u1, vı, wı and thus 
obtain probabilities uz, ve, we for the second descendant generation, 
etc. In general we get from (7.2) 


(7.3) ihe, gw 
I~, 1 = w, 
and 
(7.4) Uni = Pr, Wn41 = 2nIn, Watt = Qr 


* This section treats a special subject and may be omitted. 
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A comparison of (7.3) and (7.4) shows that 


Ungt + Papi Pa 1 
(7.5) Pout = = = 
1 — Wry Tine l Ue 
and similarly 
v, Qn 
(7.6) m1 = — 


From (7.6) we can calculate qa explicitly. In fact 


1 1 
(7.7) = a aa 
In+1 Qn 
whence successively 
1 i 4 i EER | 1 al 1 
78) —s14+—- >a —=384-5 a Sane 
qı 1 R 4 4 q qn q 
or 
q a N 
9 = i Waa = : 
(7,9) e TF ne G + ) 


We see that the unproductive (or undesirable) genotype gradually 
drops out, but the process is extremely slow. For q = 0.1 it takes ten 
generations to reduce the frequency of a-genes by one-half; this reduces 
the frequency of the aa-type approximately from 1 to } per cent. (If 
a is sex-linked, the elimination proceeds much faster as shown in prob- 
lem 29; for a generalized selection scheme see problem 30.) 


8. PROBLEMS FOR SOLUTION 


1. Three dice are rolled. If no two show the same face, what is the probabil- 
ity that one is an ace? 

2. Given that a throw with ten dice produced at least one ace, what is the 
probability p of two or more aces? 

3. Bridge. In a bridge party West has no ace. What probability should 
be attributed to the event of his partner having (a) no ace, (b) two or more 
aces? Verify the result by a direct argument. 

4. Bridge. North and South have ten trumps between them (trumps being 
cards of a specified suit). (a) Find the probability that all three remaining 
trumps are in the same hand (either East or West has no trumps). (b) If it is 
known that the king of trumps is included among the three, what is the proba- 
bility that he is “unguarded” (that is, one player has the king, the other the 


remaining two trumps)? 


1 For a further analysis of various eugenic effects (which are frequently different 
from the ideas of enthusiastic proponents of sterilization laws) see G. Dahlberg, 
Mathematical methods for population genetics, New York and Basel, 1948. 
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5. Discuss the key problem in example II(7.b) in terms of conditional proba- 
bilities following the pattern of example (2.b). 
a6 Tn a bolt factory machines A, B, C manufacture, respectively, 25, 35, and 
40 per cent of the total. Of their output 5, 4, and 2 per cent are defective 
bolts. A bolt i$ drawn at random from the produce and is found defective. 
What are the probabilities that it was manufactured by machines A, B, C? 
\-7. Suppose that 5 men out of 100 and 25 women out of 10,000 are color- 
blind. A colorblind person is chosen at random. What is the probability of 
his being male? (Assume males and females to be in equal numbers.) 

8. Seven balls are distributed randomly in seven cells; the probabilities of 
the various arrangements are tabulated in table 1 of chapter II, section 5. 
Using this table, verify that the probability of a cell’s being triply occupied, 
given that exactly two cells are empty, is } to five decimals. Show that } is 
the correct answer. 

9. A die is thrown as long as necessary for an ace to turn up. Assuming 
that the ace does not turn up at the first throw, what is the probability that 
more than three throws will be necessary? 

10. Continuation. Suppose that the number, n, of throws is even. What 
is the probability that n = 2? 

11. Let * the probability pa that a family has exactly n children be ap” when 
n> 1, and po = 1 — ap(l +p + p? +...). Suppose that all sex distribu- 
tions of n children have the same probability. Show that for k > 1 the proba- 
bility that a family contains exactly k boys is 2ap*/(2 — p)**}. 

12. Continuation. Given that a family includes at least one boy, what is 
the probability that there are two or more? 

13. Die A has four red and two white faces, whereas die B has two red and 
four white faces. A coin is flipped once. If it falls heads, the game continues 
by throwing die A alone; if it falls tails, die B is to be used. (a) Show that the 
probability of red at any throw is 4. (b) If the first two throws resulted in red, 
what is the probability of red at the third throw? (c) If red turns up at the 
first n throws, what is the probability that die A is being used? (d) To which 
urn model is this game equivalent? 

14. In example (2.a) let z, be.the probability that the winner of the nth 
trial wins the entire game; let yn and z, be the probabilities of victory for the 
losing and the pausing player, respectively, of the nth trial. (a) Show that 


(+) Zn = 3+ Wanto Yn = Fn, Zn = fany. 


(b) Show by a direct simple argument that in reality Tn = 2, Yn = Y, Zn = 2 
are independent of n. (c) Conclude that the probability that player a wins 
the game is 3%; (in agreement with problem I, 5). (d) Show that zn = 4, 
Yn = +, Zn = $ is the only bounded solution of (+). 

15. Let the events Aj, Az, ..., An be independent and P{ Ax} = p. Find 
the probability p that none of the events occurs. 


* According to A. J. Lotka, American family statistics satisfies our hypothesis 
with p = 0.7358. See Théorie analytique des associations biologiques TI, Actualités 
scientifiques et industvielles, no. 780 Paris, 1939. 
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16. Continuation. Show that always p < e—*?x. 

17. Continuation. From Bonferroni’s inequality IV(5.7) deduce that the 
probability of k or more of the events Ai, ..., An occurring simultaneously 
is less than (pı +...+ pn)*/Kl. 

18. To Polya’s urn scheme, example (2.c). Given that the second ball was 
black, what is the probability that the first was black? 

19. To Polya’s urn scheme, example (2.c). Show by induction that the proba- 
bility of a black ball at any trial is b/(b + 7). 

20. Continuation. Prove by induction: for any m < n the probabilities 
that the mth and the nth drawings produce black, black or black, red are 


b(b + c) p br : 
(b+ r)(b +r +c) (6+r)\(b+r+c) 


respectively. Generalize to more than two drawings. 

21. Time symmetry of Polya’s scheme. Let A and B stand for either black 
or red (so that AB can be any of the four combinations). Show that the proba- 
bility of A at the nth drawing, given that the mth drawing yields B, is the 
same as the probability of A at the mth drawing when the nth drawing yields B. 

22. In the Polya scheme let p(n) be the probability of k black balls in the 
first n drawings. Prove the recurrence relation 


Sy k= 
p(n + 1) = p(n) ae + p(n) peik 


where p_x(n) is to be interpreted as 0. Use this relation for a new proof of 
(2.3). 
23. The Polya distribution. In (2.4) set 


en btr > bpa S grr? 
Show that 
SB aa 
(2 L is 
(8.2) Pny.n = , n=m+ ny 


remains meaningful for arbitrary (not ae rational) constants p > 0, 
q>0,y > —1 such that p +g = 1. Verify that Pan > 0 and 


k 
Ven = 1. 
= 


Thus equation (8.2) defines a probability distribution on the integers 0, 1, ..., 
n, the Polya distribution. 

24. Limiting form of the Polya distribution. If n — œ, p + 0, y > 0 so 
that np — A, ny > p™', then 


ges Crean) Gees): Gee) 
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Verify this and show that for fixed À, p the terms on the right add to unity. 
(The right side represents the so-called negative binomial disiribution; cf. chap- 
ter VI, section 8, and problem VI, 37.) 

25. Interpret equation II(11.8) in terms of conditional probabilities. 


Applications in Biology 
26. Under random mating less than half the population belongs to genotype 


Aa. 
27. Generalize the results of section 5 to the case where each gene can have 


k+1 p 
any of the forms Aj, Az, ..., Ax, so that there are ( : ) genotypes instead 


of three (multiple alleles). 

28. Brother-sister mating. Two parents are selected at random from a popu- 
lation in which the genotypes AA, Aa, aa occur with frequencies u, 2v, w. 
This process is repeated in their progeny. Find the probabilities that both 
parents of the first, second, third filial generation belong to AA [ef. examples” 
XV(2.) and XVI(4.b)). 

29. Selection. Let a be a recessive sex-linked gene, and suppose that a 
selection process makes mating of a-males impossible. If the genotypes AA, 
Aa, aa appear among females, with frequencies u, 2v, w, show that for female 
descendants of the first generation u, = u + v, 2v1 = v + w, wi = 0 and hence 
pı =p+4q, qı = 4q. That is to say, the frequency of the a-gene among 
females is reduced to one-half. 

30. The selection problem of section 7 can be generalized by assuming that 
only the fraction A(0 < A < 1) of the aa-class is eliminated. Show that 


ie a _v+(1-—w 
DRITEN IR ER 


More generally, (7.3) is to be replaced by 


a 1 — Agn 
Pry = Toi Qn+1 = Qn E 


(The general solution of these equations appears to be unknown.) 

31. Consider simultaneously two pairs of genes with possible forms (A, a) 
and (B, b), respectively. Any person transmits to each descendant one gene 
of each pair, and we shall suppose that each of the four possible combinations 
has probability ł. (This is the case if the genes are on separate chromosomes; 
otherwise there is strong dependence.) There exist nine genotypes, and we as- 
sume that their frequencies in the parent population are Usspa, UaaBB, U AAbb, 
Ueats, 2U4cBB, 2U Aass, 2UAaBs, 2Ueens, 4U dass. Put pas = Uaasp + 
+ Usam + Uncen + Udam, pas = Usam + Usars + Usas + Udon 
PaB = Usous + Uaago + Usens + UAaBb, Pab = Uaats + Uaars + Uaast + 
+ Usaze. Compute the corresponding quantities for the first descendant 
generation. Show that for it pih = pas — ô, p4} = pas + ô, pod = Pan + Ô, 
Pw = Par — Ô with 26 = papper — PasPor. The stationary distribution is 
given by paz — 26 = pa» + 26, ete. (Notice that Hardy’s law does not apply; 
the composition changes from generation to generation.) 


Få 
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32. Assume that the genotype frequencies in a population are u = pê, 
2v = 2pq, w = q’. Given that a man is of genotype Aa, the probability that 
his brother is of the same genotype is (1 + pq)/2. 


Note: The following problems are on family relations and give a meaning to the 
notion of degree of relationship. Each problem is a continuation of the preceding 
one. Random mating and the notations of section 5 are assumed. We are here con- 
cerned with a special case of Markov chains (cf. chapter XV). Matriz algebra simplifies 
the writing. 


33. Number the genotypes AA, Aa, aa by 1, 2, 3, respectively, and let 
Piali, k = 1, 2, 3) be the conditional probability that an offspring is of geno- 
type k if it is known that the male (or female) parent is of genotype 7. Com- 
pute the nine probabilities piz, assuming that the probabilities for the other 
parent to be of genotype 1, 2, 3 are p°, 2pq, q°, respectively. 

34. Show that px is also the conditional probability that the parent is of 
genotype k if it is known that a specified offspring is of genotype i. 

35. Prove that the conditional probability of a grandson (grandfather) to 
be of genotype k if it is known that the grandfather (grandson) is of genotype 
i is given by 

PP = papie + Papu + Pap 


[The matrix (p{?) is the square of the matrix (p,,).] 

36. Show that p$ is also the conditional probability that a man is of geno- 
type k if it is known that a specified half-brother is of genotype 7. 

37. Show that the conditional probability of a man to be of genotype k 
when it is known that a specified great-grandfather (or great-grandson) is of 
genotype 7 is given by 


Pi = ppu + pÈ pu + pp = pap? + pop + pap. 


(The matrix (p$?) is the third power of the matrix (p). This procedure gives 
a precise meaning to the notion of the degree of family relationship.) 

38. More generally, define probabilities p{? that a descendant of nth genera- 
tion is of genotype k if a specified ancestor was of genotype îi. Prove by induc- 
tion that the p{? are given by the elements of the following matrix 


p° + pla — p)/2" 2pq + (1 — 4pq)/2"  q°+ g(p — q)/2 


(e + pg/2"—' 2pq + glg — p)/2"=! q? — g?/an— ) 
Re pt 2pq + píp — 9)/2"-* q+ pqg/2"—' 


(This shows that the influence of an ancestor decreases from generation to 
generation by the factor 3.) 


9 The first edition contained an error since the wo ‘d brother (two common parents) 
was used where a half-brother was meant. This error is pointed out and the cor- 
rect formulas are given in C. C. Li and Louis Sacks, The derivation of the joint 
distribution and correlation between relatives by the use of stochastic matrices, 


Biometrika, vol. 40 (1954), pp. 347-360. 
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39. Consider the problem 36 for a full brother instead of a half-brother. 
Show that the corresponding matrix is 


31 +9)? 39(1 +79) i? 
(Ha +p) 41+ 79) t+ 2) 
ip A ta) il +g? 


40. Show that the degree of relationship between uncle and nephew is the 
same as between grandfather and grandson. 


a 


CHAPTER VI 


The Binomial 
and the Poisson Distributions 


1. BERNOULLI TRIALS! 


Repeated independent trials are called Bernoulli trials if there are only 
two possible outcomes for each trial and their probabilities remain the same 
throughout the trials. It is usual to denote the two probabilities by p 
and g, and to refer to the outcome with probability p as “success,” S, 
and to the other as “failure,” F. Clearly, p and q must be non-nega- 
tive, and 


(1.1) p+q=1. 


The sample space of each individual trial is formed by the two points 
Sand F. The sample space of n Bernoulli trials contains 2” points or 
successions of n symbols S and F, each point representing one possible 
outcome of the compound experiment. Since the trials are independ- 
ent, the probabilities multiply. In other words, the probability of any 
specified sequence is the product obtained on replacing the symbols S and F 


` by p and q, respectively. Thus P{(SSFSF ... FFS)} = ppgpq >: gap. 


Examples. The most familiar example of Bernoulli trials is pro- 
vided by successive tosses of a true or symmetric coin; here p = q = 3. 
If the coin is unbalanced, we still assume that the successive tosses are 
independent so that we have a model of Bernoulli trials in which the 
probability p for success can have an arbitrary value. Repeated ran- 
dom drawings from an urn containing at each drawing r red and b black 
balls represent Bernoulli trials with p = r/(r +b). Often we have no 
interest in distinguishing among several outcomes and prefer to de- 
scribe any result simply as A or non-A. Thus with good dice the dis- 
tinction between ace (S) and non-ace (F) leads to Bernoulli trials with 
a 

1 James Bernoulli (1654-1705). His main work, the Ars conjectandi, was pub- 


lished in 1713. 
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p = $, whereas distinguishing between even or odd leads to Bernoulli 
trials with p = 4. If the die is unbalanced, the successive throws still 
form Bernoulli trials, but the corresponding probabilities p are dif- 
ferent. Royal flush in poker or double ace in rolling dice may represent 
success; calling all other outcomes failure, we have Bernoulli trials with 

= 1/649,740 and p = 3g, respectively. Reductions of this type are 
usual in statistical applications. For example, washers produced in 
mass production may vary in thickness, but, on inspection, they are 
classified. as conforming (S) or defective (F) according as their thick- 
ness is, or is not, within prescribed limits. 


The Bernoulli scheme of trials is a theoretical model, and only ex- 
perience can show whether it is suitable for the description of specified 
experiments. Our knowledge that successive tossings of a coin con- 
form to the Bernoulli scheme is derived from experimental evidence. 
The man in the street, and also the philosopher K. Marbe, believes 
that after a run of seventeen heads tail becomes more probable. This 
argument has nothing to do with imperfections of physical coins; it 
endows nature with memory, or, in our terminology, it denies the 
stochastic independence of successive trials. Marbe’s theory cannot 
be refuted by logic but is rejected because of lack of empirical support. 

In sampling practice, industrial quality control, ete., the scheme of 
Bernoulli trials provides an ideal standard even though it can never 
be fully attained. Thus, in the example above of the production of 
washers, there are many reasons why the output cannot conform to 
the Bernoulli scheme. The machines are subject to changes, and hence 
the probabilities do not remain constant; there is a persistence in the 
action of machines, and therefore long runs of deviations of like kind 
are more probable than they would be if the trials were truly independ- 
ent. From the point of view of quality control, however, it is desirable 
that the process conform to the Bernoulli scheme, and it is an important 
discovery that, within certain limits, production can be made to behave 
in this way. The purpose of continuous control is then to discover at 
an early stage flagrant departures from the ideal scheme and to use 
them as an indication of impending trouble. 


2. THE BINOMIAL DISTRIBUTION 
Frequently we are interested only in the total number of successes 
produced in a succession of n Bernoulli trials but not in their order. 


2 Die Gleichférmigkeit in der Welt, Munich, 1916. There exists a huge critical 
pot literature on Marbe's theory. 


VI.2] THE BINOMIAL DISTRIBUTION 137 


The number of successes can be 0, 1, ..., n, and our first problem is 
to determine the corresponding probabilities. Now the event “n trials _ 
result in k successes and n — k failures” can happen in as many ways 
as k letters S can be distributed among n places. In other words, our 


n 7 5 
event contains () points, and, by definition, each point has the prob- 
ability p*g"-*. This proves the 


Theorem. Let b(k;n, p) be the probability that n Bernoulli trials with 
probabilities p for success and q = 1 — P for failure result in k successes 
and n — k failures (0 < k < n). Then 


(2.1) b(k; n, p) = H pe. 


In particular, the probability of no success is g?, and the probability 
of at least one success is 1 — g”. 


We shall treat p as a constant and denote the number of successes 
in n trials by Sn; then b(k; n, p) = P{S, = k}. In the general ter- 
minology S, is a random variable, and the function (2.1) is the “distri- 
bution” of this random variable; we shall refer to it as the binomial 
distribution. The attribute “binomial” refers to the fact that (2.1) rep- 
resents the kth term of the binomial expansion of (g + p)”. This re- 
mark shows also that b(0;n, py + b(1;2, p) +...+ b(n;n, p) = 
= (q + p)” = 1, as is required by the notion of probability. The 
binomial distribution has been tabulated.* 


Examples. (a) Weldon’s dice data. Let an experiment consist in 
throwing twelve dice and let us count fives and sixes as “success.” If 
the dice are perfect, the probability of success is p = 4 and the number 
of successes should follow the binomial distribution b(k; 12,4). Table 
1 gives these probabilities, together with the corresponding observed 
average frequencies in 26,306 actual experiments. The agreement looks 
good, but for such extensive data it is really very bad. Statisticians 
usually judge closeness of fit by the chi-square criterion. According 
to it, deviations as large as those observed would happen with true dice 
only once in 10,000 times. It is, therefore, reasonable to assume that 


2 For n < 50, see National Bureau of Standards, Tables of the binomial probabil- 
ity distribution, Applied Mathematics Series, vol. 6 (1950). For 50 <2 < 100, see 
H. C. Romig, 50-100 Binomial tables, John Wiley and Sons, 1953. For a wider i 
range see Tables of the cumulative binomial probability distribution, by the Harvard ` ` 
Computation Laboratory, 1955, and Tables of ihe cumulative binomial probabilities, ` 
bv the Ordnance Corps, ORDP 20-11 (1952). x, ai 
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the dice were biased. A bias with probability of success p = 0.3377 
would fit the observations.‘ 


TABLE 1 


Wetpon’s Dice Data 


Observed 
k blk; 12, 4) Frequency b(k; 12, 0.3377) 
0 0.007 707 0.007 033 0.007 123 
1 .046 244 -043 678 .043 584 
2 127 171 -124 116 +122 225 
3 .211 952 -208 127 -207 736 
4 -238 446 -232 418 -238 324 
5 -190 757 -197 445 -194 429 
6 111 275 -116 589 -115 660 
7 .047 689 .050 597 -050 549 
8 .014 903 -015 320 -016 109 
9 .003 312 -003 991 -003 650 
10 .000 497 .000 532 -000 558 
11 .000 045 .000 152 .000 052 
12 -000 002 -000 000 -000 002 


(b) In chapter IV, section 4, we have encountered the binomial dis- 
tribution in connection with a card-guessing problem, and the columns 
bm of table 3 exhibit the terms of the distribution for n = 3, 4, 5, 6, 10 
and p = 1/n. In the occupancy problem II(4.c) we found formula 
II(4.5), which is another special case of the binomial distribution. 

(c) If the probability of success is 0.01, how many trials are neces- 
sary in order for the probability of at least one success to be $ or more? 
Here we seek the smallest integer n for which 1 — (0.99)" > 4, or 
—n log (0.99) > log 2; therefore n > 70. 

(d) A power supply problem. Suppose that n = 10 workers are to 
use intermittently electric power, and we are interested in estimating 
the total load to be expected. For a crude approximation imagine that 
at any given time each worker has the same probability p of requiring 
a unit of power. If they work independently, the probability of exactly 
k workers requiring power at the same time should be b(k;n, p). If, 
on the average, a worker uses power for 12 minutes per hour, we would 


put p = 4. The probability of seven or more workers requiring cur- 


“R. A. Fisher, Statistical methods for research workers, Edinburgh-London, 1932, 
p. 66, or T. C. Fry, Prohability and its engineering uses, New York, 1928, pp. 303ff. 
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rent at the same time is then b(7; 10, 0.2) +...+ b(10; 10, 0.2) = 
= 0.0008643584. In other words, if the supply is adjusted to six power 
units, an overload hes probability 0.00086 .. - and should be expected 
for about one minute in 1157, that is, about one minute in twenty 
hours. The probability of eight or more workers requiring current 
at the same time is only 0.0000779264 or about eleven times less. 

(e) Testing sera or vaccines.’ Suppose that the normal rate of infec- 
tion of a certain disease in cattle is 25 per cent. To test a newly dis- 
covered serum 7 healthy animals are injected with it. How are we to 
evaluate the result of the experiment? If the serum is absolutely 
worthless, the probability that exactly k of the n test animals remain 
free from infection may be equated to b(k; n, 0.75). For k =n = 10 
this probability is about 0.056, and for k = n = 12 only 0.032. Thus, 
if out of ten or twelve test animals none catches infection, this may be 
taken as an indication that the serum has had an effect, although it is 
not a conclusive proof. Note that, without serum, the probability that 
out of seventeen animals at most one catches infection is about 0.0501. 
It is therefore stronger evidence in favor of the serum if out of seventeen 
test animals only one gets infected than if out of ten all remain healthy. 
For n = 23 the probability of at most two animals catching infection 
is about 0.0492, and thus two failures out of twenty-three is again 
better evidence for the serum than one out of seventeen or none out 
of ten. 

(f) Another statistical test. Suppose n people have their blood pres- 
sure measured with and without a certain drug. Let the observations 
be zi, .--,%n and 2'1,...,2’n. We say that the ith trial resulted in 
success if x; < z's and in failure if z; > v'i (For simplicity we may 
assume that no two measurements lead to exacily the same result.) If 
the drug has no effect, then our observation should correspond to n 
Bernoulli trials with p = 3, and an excessive number of successes is 
to be taken as evidence that the drug has an effect. 


3. THE CENTRAL TERM AND THE TAILS 
From (2.1) we see that 
b(k;n,p) _ (w@— e+ Up _ 1+ ntp- k, 
b(k — 1;n, p) kg kg 
Accordingly, the term b(k; n, p) is greater than the preceding one for 


(3.1) 


V. G. Panse, Size of experiments for testing sera or vac- 


s P, V. Sukhatme and 
4 ba oe ence and Animal Husbandry, vol. 13 (1943), 


cines, Indian Journal of Veterinary Sci 
pp. 75-82. 
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k < (n + 1)p and is smaller for k > (n + 1)p. If (n+ 1)p = m hap- 
pens to be an integer, then b(m; n, p) = b(m—1;n, p). There exists 
exactly one integer m such that 


(3.2) (m+ l)p-1<m< (n+ 1)p, 
and we have the 


Theorem 1. As k goes from 0 to n, the terms b(k; n, p) first increase 
monotonically, then decrease monotonically, reaching their greatest value 
when k = m, except that b(m—1;n, p) = b(m; n, p) when m = (n + 1)p. 


We shall call b(m;n, p) the central term. Often m is called “the 
most probable number of successes,” but it must be understood that 
for large values of n all terms b(k; n, p) are small. In 100 tossings of 
a true coin the most probable number of heads is 50, but its probability 
is less than 0.08. In the next chapter we shall find that b(m; n, p) is 
approximately (2mnpg)™>. 

It is obvious that the ratio in formula (3.1) decreases monotonically 
as k increases; thus, when k > r + 1 


b(k; n, p) Z (n — r)p 


(3.3) = : 

bk — 1, p) ` (r+ 1g 
Set herein k = r+1, ...,7-+v and multiply the v inequalities to obtain 
(3.4) blr + ¥; 2, p) < [e = ae’ 

b(r; n, p) (r+ 1g 


Forr > np the fraction within braces is less than unity, and summation 
over » leads to a finite geometric series with ratio (n — r)p/(r¥ + 1)q. 
We conclude that for r > np 


no ; i ERN 
(3.5) x blr + v; n, p) < b(r; n, p) maae 


On the left we have the right “tail” of the binomial distribution, namely 
the probability of at least r successes. The same calculation applied 
to the left tail shows that for s < np 


s (n—s+1)p 
(3.6) È blo; m, p) < b(s; n, p) @+ip—s 


We have proved 
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Theorem 2. If r > np, the probability of at least r successes satisfies 
ihe inequality (3.5); if s < np, the probability of at most s successes satis- 
fies the inequality (3.6). 


[For an alternative proof see problem 39(a).] 


4. THE LAW OF LARGE NUMBERS 


On several occasions we have mentioned that our intuitive notion of 
probability is based on the following assumption. If in n identical trials 
A occurs » times, and if n is very large, then »/n should be near the 
probability p of A. Clearly, a formal mathematical theory can never 
refer directly to real life, but it should at least provide theoretical 
counterparts to the phenomena which it tries to explain. Accordingly, 
we require that the vague introductory remark be made precise in the 
form of a theorem. For this purpose we translate “identical trials” 
as “Bernoulli trials” with probability p for success. If S, is the num- 
ber of successes in 7 trials, then S„/n is the average number of suc- 
cesses and should be near p. It is now easy to give a precise meaning 
to this. Consider, for example, the probability that S,/n exceeds 
p + e, where e > 0 is arbitrarily small but fixed. This probability is 
the same as P{S, > n(p + ¢)} and equals the left side of (3.5) when 
r is the smallest integer exceeding n(p + ¢). Then (3.5) implies 


nptd) +g 


(4.1) P{S, > n(p + 6} < blr; n, p) em 


With increasing n the fraction on the right.remains bounded, whereas 
b(r;n, p) — 0 since b(r;n, p) < b(k;n, p) for each k such that 
(n+ l)p < k <r, and there are about ne such terms b(k; n, p). It 
follows that as n increases, P{S, > n(p + «)} — 0. Using formula 
(3.6), we see in the same way that P{S, < n(p — «)} — 0, and we 
have thus 


Sa 
——pl<e ->l 
n 


(4.2) P l 


In words: As n increases, the probability that the average number of 
successes deviates from p by more than any preassigned e tends to zero. 
This is one form of the law of large numbers and serves as a basis for 
the intuitive notion of probability as a measure of relative frequencies. 
For practical applications it must be supplemented by a more precise 
estimate of the probability on the left side in (4.2); such an estimate 
is provided by the normal approximation to the binomial distribution 


142 BINOMIAL POISSON DISTRIBUTIONS [VI.4 


[cf. the typical example VII(3.g)].. Actually formula (4.2) is a simple 
consequence of the latter (problem VII, 18). 

The assertion (4.2) is the classical law of large numbers. It is of very 
limited interest and should be replaced by the more precise and more 
useful strong law of large numbers (see chapter VIII, section 4). 

Warning. It is usual to read into the law of large numbers things 
which it definitely does not imply. If Peter and Paul toss a perfect 
coin 10,000 times, it is customary to expect that Peter will be in the 
lead roughly half the time. This is not true. The arc sine law (chapter 
III, section 5) states that such an equalization is least probable. The 
probability that Peter leads in less than 20 trials is very much larger than 
the probability that the number of trials in which he leads lies between 
4990 and 5010. There does not exist any tendency for the periods of 
lead to equalize. The law of large numbers asserts only that in a large 
number of different coin-tossing games the frequency of those in which 
heads lead is, at any given moment, close to %. Nothing is said about 
the fluctuations of the lead within a fized game. 


5. THE POISSON APPROXIMATION ° 


In many applications we deal with Bernoulli trials where, compara- 
tively speaking, n is large and p is small, whereas the product 


(5.1) A = np 


is of moderate magnitude. In such cases it is convenient to use an 
approximation formula to b(k; n, p) which is due to Poisson and which 
we proceed to derive. We have b(0; n, p) = (1 — p)” or, substituting 
from (5.1), 


> n 
(6.2) worm 2) = (1-2) 
n, 
Passing to logarithms and using the Taylor expansion II(8.10), we find 
A X 
(5.3) lozO;m p) = nlog(1-*) = —y—- ... 
n 2n 


so that for large n 
(5.4) b(0;n, p) = e™, 
where the sign ~ is used to indicate approximate equality (in the pres- 


ent case up to terms of order of magnitude n—'). Furthermore, from 
(3.1) it is seen that for any fixed k and sufficiently large n we have 


*Siméon D. Poisson (1781-1840). His book, Recherches sur la probabilité des 
Jugements en matitre criminelle et en matière civile, précedées des règles générales du 
calcul des probabilités, appeared in 1837. 
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b(k; n, A-(kK-1 ` 
(6.5) ESR), SEA 
b(k—1; n, p) kq k 
For k = 1 we get from this and (5.4) b(1;n, p) ~ ne. For k = 2 
we get b(2; n, p) = A?e™™/2. Generally we see by induction that 
Na 
5. b(k; = —e, 
(5.6) (k; m p) = e 


This is the famous Poisson approximation to the binomial distribution. 
(See problems 30-34 for an estimate of the error and a proof that the 
approximation in (5.6) is uniform when n > œ and p > 0 in such a 
way that à = np remains bounded.) It is convenient to have a symbol 
for the right-hand member in (5.6), and we shall put 

A a} 
(5.7) p(k;d) =e mh 
With this notation p(k; à) should be an approximation to b(k; n, \/n) 
when 7 is sufficiently large. 

Examples. (a) The entries Pm of the last column of table 3 in 
chapter IV give the values p(m; 1). In the preceding columns bm 
stands for b(m; N, 1/N). The table enables us to compare the Poisson 
distribution p(m; 1) with the binomial distributions with p = 1/n and 
n = 3, 4, 5, 6, 10. It will be seen that the agreement is surprisingly 
good despite the small values of n. 

(b) Table 2 compares p(k; 1) to the binomial distribution with 


TABLE 2 


An EXAMPLE OF THE POISSON APPROXIMATION 


k b(k; 100, 100) p(k; 1) Ne 
0 0.366 032 0.367 879 41 
1 .369 730 .367 879 34 
2 184 865 183 940 16 
3 .060 999 061 313 8 
4 .014 942 015 328 0 
5 .002 898 .003 066 1 
6 .000 463 .000 511 0 
7 .000 063 .000 073 0 
8 .000 007 .000 009 0 
9 .000 001 .000 001 0 


The first columns illustrate the Poisson approximation to the binomial distribu- 
tion. The last column records the number of batches of 100 pairs of randem digits 
each in which the combination (7, 7) appears exactly k times. 
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n = 100, p = ġo. It shows the approximation to be satisfactory for 
many purposes. As an example take the occurrence of the combina- 
tion (7,7) among 100 pairs of random digits, which should have the 
binomial distribution b(k; 100, ġo). The last column of table 27 gives 
the actual counts in 100 batches of 100 random digits each. To obtain 
relative frequencies all entries of the last column should be divided by 
100. These frequencies agree reasonably with the theoretical proba- 
bilities. (As judged by the x”-criterion, chance fluctuations should, in 
about 75 out of 100 similar cases, produce larger deviations of observed 
frequencies from the theoretical probabilities.) 

(c) Birthdays. What is the probability, pr, that in a company of 
500 people exactly k will have birthdays on New Year’s Day? If the 
500 people are chosen at random, we may apply the scheme of 500 
Bernoulli trials with probability of success p= y$. Then 
po = (384)? = 0.2537.... For the Poisson approximation we 
put A= $% = 1.3699.... Then p(0;) = 0.2541, which in- 
volves an error only in the fourth decimal place. For k = 1, 2, ... 
the correct values of px as calculated from the binomial formula are 
pı = 0.3484..., po = 0.2388..., pa = 0.1089..., p4 = 0.0872..., 
ps = 0.0101..., pe = 0.0023.... The corresponding Poisson approx- 
imations are p(1;\) = 0.3481..., p(2;A) = 0.2385..., p(8;A) = 
= 0.1089..., p(4;d) = 0.0373..., p(5;d) = 0.0102..., p(6;d) = 
= 0.0023.... All errors are in the fourth decimal place. 

(d) Defective items. Suppose that screws are produced under statis- 
tical quality control so that it is legitimate to apply the Bernoulli 
scheme of trials. If the probability of a screw’s being defective is 
p = 0.015, then the probability that a box of 100 screws does not con- 
tain a defective one is (0.985)!°° = 0.22061. The corresponding 
Poisson approximation is e~!-° = 0.22313..., which should be close 
enough for most practical purposes. We now ask: How many screws 
should a box contain in order that the probability of finding at least 
100 conforming screws be 0.8 or better? If 100 + x is the required 
number, then x is a small integer. To apply the Poisson approximation 
for n = 100 + = trials we should put à = np, but np is approximately 
100p = 1.5. We then require the smallest integer x for which 


1.5 z 
68) eis fı Tep ge | > 0.8. 
1 x! J ~ 


7M. G. Kendall and Babington Smith, Tables ti 
Tracts for Computers No. 24, Cambridge, 1940. er ae 
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In tables è we find that for z = 1 the left side is approximately 0.56, 
and for x = 2 it is 0.809. Thus the Poisson approximation would 
lead to the conclusion that 102 screws are required. Since 0.809 is 
dangerously near the given threshold of 0.8, the number 103 is safer. 
Actually the probability of finding at least 100 conforming screws in a 
box of 102 is 


(0.985) 10? -+ E (0.985)!°! (0.015) + 
102 
+ ( 2 ) (0.985) !99(0.015)? = 0.8022... 


(e) Centenarians. At birth any particular person has a small chance 
of living 100 years, and in a large community the number of yearly 
births is large. Owing to wars, epidemics, etc., different lives are not 
stochastically independent, but as a first approximation we may compare 
n births to n Bernoulli trials with death after 100 years as success. In 
a stable community, where neither size nor mortality rate changes 
appreciably, it is reasonable to expect that the frequency of years in 
which exactly k centenarians die is approximately p(k; A), with à de- 
pending on the size and health of the community. Records of Switzer- 
land confirm this conclusion.® 

(f) Misprints, raisins, etc. If in printing a book there is a constant 
probability of any letter’s being misprinted, and if the conditions of 
printing remain unchanged, then we have as many Bernoulli trials as 
there are letters. The frequency of pages containing exactly k mis- 
prints will then be approximately p(k; A), where A is a characteristic of 
the printer. Occasional fatigue of the printer, difficult passages, etc., 
will increase the chances of errors and may produce clusters of mis- 
prints. Thus the Poisson formula may be used to discover radical 
departures from uniformity or from the state of statistical control. 
A similar argument applies in many cases. For example, if many 
raisins are distributed in the dough, we should expect that thorough 
mixing will result in the frequency of loaves with exactly k raisins to 
be approximately p(k; A) with A a measure of the density of raisins in 
the dough. 


8E. C. Molina, Poisson's exponential binomial limit, New York, 1942. (These 
are tables giving p(k; à) and p(k; A) + p(kK+1; A) +... for k ranging from 0 to 
100.) 

° E, J. Gumbel, Les centenaires, Aktudrske Vedy, Prague, vol. 7 (1937), pp. 1-8. 
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6. THE POISSON DISTRIBUTION 


In the preceding section we have used the Poisson expression (5.7) 
merely as a convenient approximation to the binomial distribution in 
the case of large n and small p. In connection with the matching 
and occupancy problems of chapter IV we have studied different 
probability distributions, which have also led to the Poisson expres- 
sions p(k; à) as a limiting form. We have here a special case of the 
remarkable fact that there exist a few distributions of great universality 
which occur in a surprisingly great variety of problems. The three 
principal distributions, with ramifications throughout probability 
theory, are the binomial distribution, the normal distribution (to be 
introduced in the following chapter), and the Poisson distribution 


(6.1) plk; A) = e*—, 


which we shall now consider on its own merits. 

We note first that on adding the equations (6.1) for k = 0, 1, 2, ... 
we get on the right side e~* times the Taylor series for eì. Hence for 
any fixed à the quantities p(k; à) add to unity, and therefore it is pos- 
sible to conceive of an ideal experiment in which p(k; A) is the proba- 
bility of exactly k successes. We shall now indicate why many physical 
experiments and statistical observations actually lead to such an inter- 
pretation of (6.1). The examples of the next section will illustrate the 
wide range and the importance of various applications of (6.1). The 
true nature of the Poisson distribution will become apparent only in 
connection with the theory of stochastic processes (cf. chapter XVII, 
where a new approach to the Poisson distribution is given). 

Consider a sequence of random events occurring in time, such as 
radioactive disintegrations or incoming calls at a telephone exchange. 
Each event is represented by a point on the time axis, and we are 
concerned with chance distributions of points. There exist many dif- 
ferent types of such distributions, but their study belongs to the domain 
of continuous probabilities which we have postponed to the second vol- 
ume. Here we shall be content to show that the simplest physical 
assumptions lead to p(k; X) as the probability of finding exactly k points 
(events) within a fixed interval of specified length. Our methods are 

necessarily crude, and we shall return to the same problem with more 
adequate methods in chapter XVII. 
The physical assumptions which we want to express mathematically 
are that the conditions of the experiment remain constant in time, and 
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that non-overlapping time intervals are stochastically independent in the 
sense that information concerning the number of events in one interval 
reveals nothing about the other. The theory of probabilities in a con- 
tinuum makes it possible to express these statements directly, but being 
restricted to discrete probabilities, we have to use an approximate 
finite model and pass to the limit. 

Imagine the unit time interval divided into a great number 7 of 
intervals, each of length 1/n. Either a particular subinterval is empty, 
or it contains at least one of our random points (or events), and we 
agree to call the two possibilities failure and success, respectively. The 
probability pn of success must be the same for all n subintervals, since 
they have the same length. ‘he assumed independence of non-over- 
lapping intervals then implies that we have n Bernoulli trials, and the 
probability of exactly k successes is given by b(k;7, pn). Now the 
number of successes is not necessarily the same as the number of ran- 
dom points, since a subinterval may contain several random points. 
However, it is natural to introduce the additional assumption that the 
probability of two or more random points during a very short time 
interval is in the limit negligible.” In this case the probability of find- 
ing exactly k random points in the unit time interval is given by the 
limit of b(k; n, pn) as n — ©. When we divide each subinterval into 
two parts of equal length, we find that Ppa = 2pen — Pon’; this equa- 
tion states that success in an interval of length 1/n means either success 
in the left half, or success in the right half, or in both. It follows that 
Pn < 2Pon, and this suggests that npn increases monotonically (which 
can be proved rigorously). Ifnp, — A, then b(k;n, Pn) ~b(k3n, a/n) => 
— p(k; A), and we find (6.1) as the probability that there is a total 
of k random points contained in our unit interval. The assumption 
npn — © leads to no sensible result, as it would imply infinitely many 
random points even in the smallest interval. 

If, instead of the unit interval, we take an arbitrary interval of 
length ¢ and again use a subdivision into intervals of length 1/n, then 
we have Bernoulli trials with the same probability p, of success, but 
the number of trials is the integer nearest to nt rather than n. The 
passage to the limit is the same, but we get At instead of A. This leads 


10 This assumption is implicit in the intuitive picture of isolated random points. 
However, it is necessary to exclude the possibility of our events appearing in dou- 
blets. For example, if the events are automobile accidents, then the probability 
of two events within a short time is negligible in comparison with the probability 
of one event. On the other hand, an accident is likely to involve two cars, and if 
the events mean “a car smashed” then they are likely to appear in pairs and our 


assumption does not apply. 
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us to consider 


k 
(6.2) * p(k; ri) = e* = 


as the probability of finding exactly k points in a fixed interval of length t. 
In particular, the probability of no point in an interval of length t is 
(6.3) pO; At) = e™, 

and the probability of one or more points is therefore 1 — e—**, 

The parameter À is a physical constant which determines the density 
of points on the taxis. The larger } is, the smaller is the probability 
(6.3) of finding no point. Suppose that a physical experiment is re- 
peated a great number N of times, and that each time we count the 


humber of events in an interval of fixed length ¢. Let N, k be the num- 
ber of times that exactly k events are observed. Then 


(6.4) No +Nı +N: +...=N. 

The total number of points observed in the N experiments is 

(65) Ni +2N2 + 3N; +...= T, 

and T/N is the average. If N is large, we expect that 

(6.6) Ni ~ Np(k; at) 

(this lies at the root of all applications of probability and will be justi- 
fied and made more precise by the law of large numbers in chapter X). 


Substituting from (6.6) into (6.5), we find 


(6.7) T = N{p(1; M) + 2p(2; At) + 3p(3; At) +. j= 


X 2 
Spe se] Nyt 


= Net l 1+—+— 
j g 1 a 2! i 
and hence 


6.8 a 
(6.8) =F 


This relation gives us a means of estimating \ from observations and 
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of comparing theory with experiments. The examples of the next 
section will illustrate this point. . 


Spatial Distributions 


We have considered the distribution of random events or points 
along the t-axis, but the same argument applies to the distribution of 
points in plane or space. Instead of intervals of length ¢ we have do- 
mains of area or volume ż, and the fundamental assumption is that the 
probability of finding k points in any specified domain depends only 
on the area or volume of the domain but not on its shape. Otherwise 
we have the same assumptions as before: (1) if is small, the probability 
of finding more than one point in a domain of volume t is small as 
compared to ¢; (2) non-overlapping domains are mutually independent. 
To find the probability that a domain of volume t contains exactly k 
random points, we subdivide it into n subdomains and approximate 
the required probability by the probability of k successes in n trials. 
This means neglecting the possibility of finding more than one point 
in the same subdomain, but our assumption (1) implies that the error 
tends to zero as n — œ. In the limit we get again the Poisson distri- 
bution (6.2). Stars in space, raisins in cake, weed seeds among grass 
seeds, flaws in materials, animal litters in fields are distributed in ac- 
cordance with the Poisson law. See examples (7.b) and (7.e). 


7. OBSERVATIONS FITTING THE POISSON 
DISTRIBUTION " 


(a) Radioactive disintegrations. A radioactive substance emits @-par- 
ticles; the number of particles reaching a given portion of space during 
time ¢ is the best-known example of random events obeying the Poisson 
law. Of course, the substance continues to decay, and in the long run 
the density of a-particles will decline. However, with radium it takes 
years before a decrease of matter can be detected; for relatively short 
periods the conditions may be considered constant, and we have an 
ideal realization of the hypotheses which led to the Poisson distribution. 

In a famous experiment " a radioactive substance was observed dur- 
ing N = 2608 time intervals of 7.5 seconds each; the number of par- 


1 The Poisson distribution has become known as the law of small numbers or of 
Tare events, These are misnomers which proved detrimental to the realization of 
the fundamental role of the Poisson distribution. The following examples will 
show how misleading the two names are. ait 

12 Rutherford, Chadwick, and Ellis, Radiations from radioactive substances, Cam- 
bridge, 1920, p. 172. Table 3 and the x*estimate of the text are taken from 
H. Cramér, Mathematical methods of statistics, Uppsala and Princeton, 1945, p. 436. 
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TABLE 3 


EXAMPLE (a): RADIOACTIVE DISINTEGRATIONS 


k Nk Np(k; 3.870) 
0 57 54.399 
1 203 210.523 
2 383 407.361 
3 525 525.496 
4 532 508.418 
5 408 393.515 
6 273 253.817 
7 139 140.325 
8 45 67.882 
9 27 29.189 
k> 10 16 17.075 
Total 2608 2608.000 


ticles reaching a counter was obtained for each period. Table 3 records 
the number N+ of periods with exactly k particles. The total number 
of particles is T = ZkN; = 10,094, the average T/N = 3.870. The 
theoretical values Np(k; 3.870) are seen to be rather close to the ob- 
served numbers Ng. To judge the closeness of fit, an estimate of the 
probable magnitude of chance fluctuations is required. Statisticians 
judge the closeness of fit by the x?-criterion. Measuring by this stand- 
ard, we should expect that under ideal conditions about 17 out of 100 
comparable cases would show worse agreement than exhibited in table 3. 

(b) Flying-bomb hits on London. As an example of a spatial distri- 
bution of random points consider the statistics of flying-bomb hits in 


TABLE 4 


Examp.e (b): FLYING-BOMB Hits on LONDON 


k 0 1 2 3 4 5 and over 
Nk 229 211 93 35 7 1 
Np(k; 0.9323) 226.74 211.39 98.54 30.62 7.14 1.57 


the south of London during World War II. The entire area is divided 
into N = 576 small areas of t = } square kilometers each, and table 4 
records the number N+ of areas with exactly k hits.’ The total number 


13 The figures are taken from R. D. Clarke, An application of the Poisson dis- 
tribution, Journal of the Institute of Actuaries, vol. 72 (1946), p. 48. 


V1.7] OBSERVATIONS FITTING THE POISSON DISTRIBUTION 


TABLE 5 
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EXAMPLE (c): CHROMOSOME INTERCHANGES INDUCED BY X-RAY 


IRRADIATION 


10 


11 


Observed Ny 
Np(k; 0.35508) 


Observed Ny 
Np(k; 0.45601) 


Observed Ny 
Np(k; 0.27717) 


Observed Ny 
Np(k; 0.11808) 


Observed Ny 
Np(k; 0.25296) 


Observed Ny 
Np(k; 0.21059) 


Observed Ni 
Np(k; 0.28631) 


Observed Nk 
Np(k; 0.83572) 


Observed Ni 
Np(k; 0.39867) 


Observed Nx 
Np(k; 0.40544) 


Observed Ny 
Np(k; 0.49339) 


Cells with k Interchanges 


589.4 | 149.1 | 18.8 


xe 
Total | Level 
N in Per 
>3 Cent 
5 1073 95 
6.2 
9 682 85 
Laid! 
1 368 65 
g 
0 2566 65 
0.7 
3 759 45 
Es 
0 793 45 
BL 
I 482 40 
1.5 
2 697 35 
3.4 
5 1199 20 
9.4 
3 883 20 
7.2 
1 756 5 
10.5 
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of hits is T = ZkN; = 537, the average At = T/N = 0.9323.... The 
fit of the Poisson distribution is surprisingly good; as judged by the 
,@-criterion, under ideal conditions some 88 per cent of comparable 
observations should show a worse agreement. It is interesting to note 
that most people believed in a tendency of the points of impact to 
cluster. If this were true, there would be a higher frequency of areas 
with either many hits or no hit and a deficiency in the intermediate 
classes. Table 4 indicates perfect randomness and homogeneity of the 
area; we have here an instructive illustration of the established fact 
that to the untrained eye randomness appears as regularity or tendency 
to cluster. 

(c) Chromosome interchanges in cells. Irradiation by X-rays pro- 
duces certain processes in organic cells which we call chromosome inter- 
changes. As long as radiation continues, the probability of such inter- 
changes remains constant, and, according to theory, the numbers Ny 
of cells with exactly k interchanges should follow a Poisson distribu- 
tion. The theory is also able to predict the dependence of the param- 
eter à on the intensity of radiation, the temperature, etc., but we shall 
not enter into these details. Table 5 records the result of eleven dif- 
ferent series of experiments." These are arranged according to good- 
ness of fit. The last column indicates the approximate percentage of 
ideal cases in which chance fluctuations would produce a worse agree- 
ment (as judged by the x?-standard). The agreement between theory 
and observation is striking. 

(d) Connections to wrong number. Table 6 shows statistics of tele- 
phone connections to a wrong number.” A total of N = 267 numbers 
was observed; Nx indicates how many numbers had exactly k wrong 
connections, The Poisson distribution p(k; 8.74) shows again an ex- 
cellent fit. (As judged by the x*-criterion the deviations are near the 
median value.) In Thorndike’s paper the reader will find other tele- 
phone statistics following the Poisson law. Sometimes (as with party 
lines, calls from groups of coin boxes, etc.) there is an obvious inter- 
dependence among the events, and the Poisson distribution no longer 
fits. 


u D, G. Catcheside, D. E. Lea, and J. M. Thoday, Types of chromosome struc- 
tural change induced by the irradiation of Tradescantia microspores, Journal of 
Genetics, vol. 47 (1945-46), pp. 113-136. Our table is table IX of this paper, except 
that the x’-levels were recomputed, using a single degree of freedom. 

15 The observations are taken from F. Thorndike, Applications of Poisson’s prob- 
ability summation, The Bell System Technical Journal, vol. 5 (1926), pp. 604-624. 
This paper contains a graphical analysis of 32 different statistics. 
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TABLE 6 


Exampiy (d): Connections To WRONG NUMBER 


k Ni Nop(k; 8.74) 
0-2 1 2.05 
3 5 4.76 
4 11 10.39 
5 14 18.16 
6 22 26.45 
7 43 33.03 
8 31 36.09 
9 40 35.04 
10 35 30.63 
11 20 24.34 
12 18 17.72 
13 12 11.92 
14 7 7.44 
15 6 4.33 
216 2 4.65 
267 267.00 


Figure 1. Bacteria on a Petri plate. 


(e) Bacteria and blood counts. Figure 1 reproduces a photograph of 
a Petri plate with bacterial colonies, which are visible under the micro- 
scope as dark spots. The plate is divided into small squares. Table 7 
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TABLE 7 


EXAMPLE (e): COUNTS OF BACTERIA 


k 0 1 2 3 pa S Jets) ec res 
Observed Nx 5 19 26 26 |21 |/13./8 97 
Poisson theor. | 6.1 | 18.0 | 26.7 | 26.4 ]19.6/11.7/9.5 
Observed Ną | 26 40 38 17 7 66 
Poisson theor. | 27.5 | 42.2 | 32.5] 16.7] 9.1 
Observed N, | 59 86 49 30 |20 26 
Poisson theor. | 55.6 | 82.2 60.8 30.0 | 15.4 
Observed Ny | 83 134 135 101 40 |16 |7 63 
Poisson theor. | 75.0 | 144.5 | 139.4 | 89.7 |43.3/16.7|7.4 
Observed Ny 8 16 18 15 9 7 97 
Poisson theor. | 6.8 | 16.2] 19.2| 15.1] 9.0 6.7 
Observed Ny Ui 11 11 11 7 8 53 
Poisson theor. | 3.9 | 10.4] 13.7] 12.0 E Ea 
Observed Ny 3 7 14 21 20 p29: Z |9 85 
Poisson theor. | 2.1 8.2 | 15.8 | 20.2 |19.5|15 |9.6]9.6 


Observed Ny 60 80 45 16 9 78 
Poisson theor. | 62.6 | 75.8 | 45.8 | 18.5 7.3 


The last entry in each row includes the figures for higher classes and should 
be labeled “k or more.” 


reproduces the observed numbers of squares with exactly k dark spots 
in eight experiments with as many different kinds of bacteria.” We 
have here a representative of an important practical application of the 
Poisson distribution to spatial distributions of random points. 


1¢ The table is taken from J. Neyman, Lectures and conferences on mathematical 
statistics (mimeographed), Dept. of Agriculture, Washington, 1938. ‘The original 
(by T. Matuszewsky, J. Supinska, and J. Neyman) appeared, together with related 
material, in Zentralblatt fiir Bakteriologie, Parasitenkunde und I nfektionskrankheiten, 
II Abt., vol. 95 (1936). 
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8. WAITING TIMES. THE NEGATIVE BINOMIAL 
DISTRIBUTION 


Consider a succession of n Bernoulli trials and let us inquire how 
long it will take for the rth success to turn up. Here r is a fixed posi- 
tive integer. The total number of successes in n trials may, of course, 
fall short of r, but the probability that the rth success occurs at the 
trial number v < n is clearly independent of n and depends only on 
v,r, and p. Since necessarily y > r, it is preferable to write v = k +r. 
The probability that the rth success occurs at the trial number r + k (where 
k = 0, 1, ...) will be denoted by f(k; r, p). It equals the probability that 
exactly k failures precede the rth success. This event occurs if, and only 
if, among the r + k — 1 trials there are exactly k failures and the fol- 
lowing, or (r+-k)th, trial results in success; the corresponding proba- 


k-1 
bilities are (6 E 4 ) - p'a} and p, so that 


(8.1) sinn = (7 me 


Rewriting the binomial coefficient in accordance with formula II(12.4), 
we find the alternative form 


(8.2) stir.) = (T ro = 01,2... 


Suppose now that Bernoulli trials are continued as long as necessary 
for r successes to turn up. A typical sample point is represented by a 
sequence containing an arbitrary number, k, of letters F and exactly 
r letters S, the sequence terminating by an S; the probability of such 
a point is, by definition, pg. We must ask, however, whether it is 
possible that the trials never end, that is, whether an infinite sequence 

© 


of trials may produce fewer than r successes. Now > f(k; 1, p) is the 


k=0 
probability that the rth success occurs after finitely many trials; ac- 
cordingly, the possibility of an infinite sequence with fewer than r suc- 
cesses can be discounted if, and only if, 


n 


(8.3) ¥ skin, p) = 1 


k=0 


To prove that (8.3) holds it suffices to note that by the binomial 
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æ 


64 E(t- a-o7 =p. 


k=0 


Multiplying (8.4) by p” we get (8.3). 

In our waiting time problem r is necessarily a positive integer, but 
the quantity defined by either (8.1) or (8.2) is non-negative and (8.3) 
holds for any positive r. For arbitrary fixed real r > 0 and0 < PSJ 
the sequence {f(k;r, p)} is called a negative binomial distribution. It 
occurs in many applications (and we have encountered it in problem 
V, 24 as the limiting form of the Polya distribution). When r is a 
positive integer, {f(k;7, p)} may be interpreted as the probability dis- 
tribution for the waiting time to the rth success; as such it is also called 
the Pascal distribution. For r = 1 it reduces to the geometric distribu- 


tion {pq*}. 


TABLE 8 


PROBABILITIES (8.5) 


r Uy Ug T Uy U, 
0 9.079 589 0.079 589 15 0.023 171 0.917 941 
1 .079 589 -159 178 16 -019 081 -937 022 
2 .078 785 -237 963 17 015 447 -952 469 
3 077 177 315 140 18 .012 283 -964 752 
4 .074 790 -389 931 19 009 587 -974 338 
5 .071 674 -461 605 20 .007 338 -981 676 
6 067 902 -529 506 21 005 504 -987 180 
7 -063 568 -593 073 22 004 041 -991 220 
8 -058 783 651 855 23 -002 901 -994 121 
9 -053 671 -705 527 24 002 034 -996 155 
10 -048 363 -753 890 25 -001 392 -997 547 
11 -042 989 -796 879 26 -000 928 -998 475 
12 .037 676 834 555 27 -000 602 -999 077 
13 .032 538 .867 094 28 .000 379 .999 456 
14 .027 676 -894 770 29 -000 232 -999 688 


u, is the probability that, at the moment the first match box is found empty, 
the second contains exactly r matches, assuming that initially each box con- 
tained 50 matches. U, = uo + u +...-+ u is the corresponding probability 
of having not more than 7 matches. 


Oe 


= 
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Example. Banach’s maich box problem.“ A certain mathematician 
always carries one match box in his right pocket and one in his left. 
When he wants a match, he selects a pocket at random, the successive 
choices thus constituting Bernoulli trials with p = $. Suppose that 
initially each box contained exactly N matches and consider the mo- 
ment when, for the first time, our mathematician discovers that a box 
is empty. At that moment the other box may contain 0, 12} ey 
matches, and we denote the corresponding probabilities by u,. Let us 
identify “success” with choice of the left pocket. The left pocket will 
be found empty at a moment when the right pocket contains exactly 
r matches if, and only if, exactly N — r failures precede the (V-+1)st 
success. The probability of this event is f(N—r;N+1, 4). The same 
argument applies to the right pocket and therefore the required prob- 
ability is 
(8.5) Uy = 2f(N—1; N+1, 4) = (Br ’) 272N+r, 


Numerical values for the case W = 50 are given in table 8. [Cf. prob- 
lems 21-23 and example IX(3.f).] 
9. THE MULTINOMIAL DISTRIBUTION 


The binomial distribution can easily be generalized to the case of n 
repeated independent trials where each trial can have one of several 
outcomes. Denote the possible outcomes of each trial by EF, ..., E,, 
and suppose that the probability of the realization of E; in each trial 
is p; (i =1,...,7r). For r = 2 we have Bernoulli trials; in general, 
the numbers p; are subject only to the condition 
(9.1) mat+...+p =1, p20 
The result of n trials is a succession like H3H,EH2.... The probability 
that in n trials E, occurs kı times, Ez occurs ke times, etc., ts 

n! 
kı!ko! +++ kr! 
here the k; are arbitrary non-negative integers subject to the obvious con- 
dition 
(9.3) ki + ki +.. kr =n. 

If r = 2, then (9.2) reduces to the binomial distribution with pı = p, 


kr 


(9.2) pippa «++ pr; 


17 Communicated by H. Steinhaus. 
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P2 = q, kı = k, ko = n — k. The proof in the general case proceeds 
along the same lines, starting with formula II(4.7). 

Formula (9.2) is called the multinomial disiribution because the right- 
hand member is the general term of the multinomial expansion of 
(pı +---+ p)”. Its main application is to sampling with replacement 
when the individuals are classified into more than two categories (e.g., 
according to professions). 


Examples. (a) In rolling twelve dice, what is the probability of get- 
ting each face twice? Here Hy, ..., Ee represent the six faces, all k; 
equal 2, and all p;equal $. Therefore, the answer is (12!)(2)~°(6)—? = 
= 0.0034.... 

(b) Sampling. Let a population of N elements be divided into sub- 
classes E1, ..., E, of sizes Npı, ..., Np,. The multinomial distribu- 
tion gives the probabilities of the several possible compositions of a 
random sample with replacement of size n taken from this population. 

(c) Multiple Bernoulli trials. Two sequences of Bernoulli trials with 
probabilities of success and failure pı, q1, and pe, g2, respectively, may 
be considered one compound experiment with four possible outcomes 
in each trial, namely, the combinations (S, 8), (S, F), (F,S), (F, F). 
The assumption that the two original sequences are independent is 
translated into the statement that the probabilities of the four out- 
comes are pipe, P92, GiP2, M192, respectively. If kı, ke, kz, k4 are four 


integers adding to n, the probability that in n trials SS will appear kı 
times, SF kg times, ete., is 


n! 


9.4 e Mea 
oa kı !ka!kz!k4! 


pia thagks thas thsgkathy 


A special case occurs in sampling inspection. An item is conforming or 
defective with probabilities p and g. It may or may not be inspected 
with corresponding probabilities p’ and q’. The decision of whether 
an item is inspected is made without knowledge of its quality, so that 
we have independent trials. (Cf. problems 25, 26, and IX, 12.) 


10. PROBLEMS FOR SOLUTION 


i, Assuming all sex distributions to be equally probable, what proportion 
of families with exactly six children should be expected to have three boys and 
three girls? 


2. A bridge player had no ace in three consecutive hands. Did he have 
reason to complain of ill luck? 


3. How long has a series of random digits to be in order for the probability 
of the digit 7 appearing to be at least 3%? 
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4, How many independent bridge dealings are required in order for the 
probability of a preassigned player having four aces at least once to be 4 or 
better? Solve again for some player instead of a given one. 

5. If the probability of hitting a target is } and ten shots are fired independ- 
ently, what is the probability of the target’s being hit at least twice? 

6. In problem 5, find the conditional probability of the target’s being hit at 
least twice, assuming that at least one hit is scored. 

7. Find the probability that a hand of thirteen bridge cards selected at ran- 
dom contains exactly two red cards. Compare it with the corresponding proba- 
bility in Bernoulli trials with p = 3. (For a definition of bridge see footnote 1, 
chapter I.) 

8. What is the probability that the birthdays of six people fall in two calen- 
dar months leaving exactly ten months free? (Assume independence and 
equal probabilities for all months.) 

9. In rolling six true dice, find the probability of obtaining (a) at least one, 
(b) exactly one, (c) exactly two, aces. Compare with the Poisson approxima- 
tions, 

10. If there are on the average 1 per cent left-handers, estimate the chances 
of having at least four left-handers among 200 people. 

11. A book of 500 pages contains 500 misprints. Estimate the chances that 
a given page contains at least three misprints. 

12. Colorblindness appears in 1 per cent of the people in a certain popula- 
tion. How large must a random sample (with replacements) be if the proba- 
bility of its containing a colorblind person is to be 0.95 or more? 

13. In the preceding exercise, what is the probability that a sample of 100 
will contain (a) no, (b) two or more, cotorblind people? 

14. Estimate the number of raisins which a cookie should contain on the 
average if it is desired that the probability of a cookie to contain at least one 
raisin be 0.99 or more. 

15. The probability of a royal flush in poker is p = 1/649,740. How large 
has n to be to render the probability of no royal flush in n hands smaller than 
1/e = 4? (Note: No calculations are necessary for the solution.) 

16. A book of n pages contains’on the average A misprints per page. Esti- 
mate the probability that at least one page will contain more than k misprints. 

17. Suppose that there exist two kinds of stars (or raisins in a cake, or flaws 
in a material), The probability that a given volume contains j stars of the 
first kind is p(j; a), and the probability that it contains k stars of the second 
kind is p(k; b); the two events are assumed to be independent. Prove that 
the probability that the volume contains a total of n stars is p(n; a + b). (In- 
terpret the assertion and the assumptions abstractly.) 

18. A traffic problem. The flow of traffic at a certain street crossing is 
described by saying that the probability of a car’s passing during any given 
second is a constant p; and that there is no interaction between the passing 
of cars at different seconds. Treating seconds as indivisible time units, the 
model of Bernoulli trials applies. Suppose that a pedestrian can cross the 
street only if no car is to pass during the next three seconds. Find the proba- 
bility that the pedestrian has to wait for exactly k = 0, 1, 2, 3, 4 seconds. 
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(The corresponding general formulas are not obvious and will be derived in 
connection with the theory of success runs in chapter XIII, section 7.) 

19. Two people toss a true coin n times each. Find the probability that 
they will score the same number of heads. 

20. In a sequence of Bernoulli trials with probability p for success, find the 
probability that a successes will occur before b failures. (Note: The issue is 
decided after at most a + b — 1 trials. This problem played a role in the 
classical theory of games in connection with the question of how to divide 
the pot when the game is interrupted at a moment when one player lacks a 
points to victory, the other b points.) 

21. In Banach’s match box problem (section 8) find the probability that at 
the moment when the first box is emptied (not found empty) the other contains 
exactly r matches (where r = AE sieray AN) A 

22. Continuation. Using the preceding result, find the probability z that 
the box first emptied is not the one first found to be empty. Show that the 


expression thus obtained reduces to = G) 2-°N—1 or }(Na)—}, approxi- 
mately. 

23. Proofs of a certain book were read independently by two proofreaders 
who found, respectively, kı and ke misprints; kız misprints were found by both. 
Give a reasonable estimate of the unknown number, n, of misprints in the 
proofs. (Assume that proofreading corresponds to Bernoulli trials in which 
the two proofreaders have, respectively, probabilities pı and p2 of catching a 
misprint. Use the law of large numbers.) ` 

Note: The problem describes in simple terms an experimental setup used 
by Rutherford for the count of scintillations. 

24. To estimate the size of an animal population by trapping,’ traps are 
set r times in succession. Assuming that each animal has the same probability 
q of being trapped; that originally there were n animals in all; and that the 
only changes in the situation between the successive settings of traps are that 
animals have been trapped (and thus removed); find the probability that 
the r trappings yield, respectively, 71, m2, ..., Mr animals, 

25. Multiple Bernoulli trials. In example (9.c) find the conditional proba- 
bilities p and q of (S, F) and (F, S), respectively, assuming that one of these 
combinations has occurred. Show that p > $ or p < $, according as pı > pz 
or p2 > Pi. 

26. Continuation.» If in n pairs of trials exactly m resulted in one of the 
combinations (S, F) or (F, S), show that the probability that (S, F) has occurred 
exactly k times is b(k; m, p). 

27. Combination of the binomial and Poisson distributions. Suppose that the 


18 P, A, P. Moran, A mathematical theory of animal trapping, Biometrika, vol. 
38 (1951), pp. 307-311. 

19 A, Wald, Sequential tests of statistical hypotheses, Annals of Mathematical 
Statistics, vol. 16 (1945), p. 166. Wald uses the results given above to devise a 
practical method of comparing two empirically given sequences of trials (say, the 
output of two machines), with a view of selecting the one with the greater prob- 
ability of success. He reduces this problem to the simpler one of finding whether 
in a sequence of Bernoulli trials the frequency of success differs significantly from $. 
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probability of an insect’s laying r eggs is p(r;) and that the probability of an 
egg’s developing is p. Assuming mutual independence of the eggs, show that 
the probability of a total of k survivors is given by the Poisson distribution 


with parameter Ap. 

Note: Another example for the same situation: the probability of k chromo- 
some breakages is p(k; à), and the probability of a breakage healing is p. 
(For additional examples of a similar nature see IX(1.d) and chapter XII, 
section 1.) 


28. Prove the theorem: 2 The maximal term of the multinomial distribution 
(9.2) satisfies the inequalities 
(10.1) npi—1 < k; < (n +r — 1p, i= 1 aoit 
(Hint: Prove first that the term is maximal if, and only if, pik; < pki + 1) 
for each pair (2,7). Add these inequalities for all j, and also for all i ¥ j.) 

29. The terms p(k; A) of- the Poisson distribution reach their maximum 
when k is the largest integer not exceeding À. 

Note: Problems 30-34 refer to the Poisson approximation of the binomial distribu- 
tion. It is understood that à = np, and m is the largest integer not exceeding (n + 1)p 
(that is, m is the index of the central term of the binomial distribution). 

30. Show that as k goes from 0 to œ the ratios a, = b(k; n, p)/p(k; A) first 
increase, then decrease, reaching their maximum for k = m. 

31. As k increases, the terms b(k; n, p) are first smaller, then larger, and 
then again smaller than p(k; A). 

32. If n — œ% and p — 0 so that np = à remains constant, then 

blk; n, py — plk; A) 
uniformly for all k. 
33. Show that 


Ak x n—k Ar k k AN’ 
ag (AF) zinazaa- -3 
34. Conclude from (10.2), using the inequalities II(8.12), that 


03) PNA” > bk; n p) > plk; PeH HMO, 


Note: Although (10.2) is very crude, the inequalities (10.3) provide excellent 
error estimates. It is easy to improve on (10.3) by calculations similar to 
those used in chapter II, section 9. Incidentally, using the result of problem 
30, it is obvious that the exponent on the left in (10.3) may be replaced by 


mX/n which is <(p + nr. 
Further Limit Theorems 
35. Binomial approximation to the hypergeometric distribution. A population 
of N elements is divided into red and black elements in the proportion p:q 


20 In the first edition it was only asserted that |k; — np:| <r. The present im- 
provement and its elegant proof are due to P. A. P. Moran. 
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(where p + ¢ = 1). A sample of size n is taken without replacement. The 
probability that it contains exactly k red elements is given by the hyper- 
geometric distribution of chapter II, section 6. Show that as N — © this 
probability approaches b(k; n, p). 

36. In the preceding problem let p be small, n large, and à = np of moderate 
magnitude. The hypergeometric distribution can then be approximated by 
the Poisson distribution p(k; \)- Verify this directly without using the binomial 
approximation. 

37. In the negative binomial distribution {f(k;7, p)} of section 8 let qg —> 0 
and r — © in such a way that rg = A remains fixed. Show that 


fikr, p) — p(k;à). 


(Note: This provides & limit theorem for the Polya distribution; cf. problem 
V, 24.) 

38. Multiple Poisson distribution. When 7 is large and np; = Aj is moderate, 
the multinomial distribution (9.2) can be approximated by 


AEN + Ager 


en Gite + Fy) 
ki!ka! + +key! 


Prove also that the terms of this distribution add to unity. (Note that prob- 
lem 17 refers to a double Poisson distribution.) 


39. (a) Derive (3.6) directly from (3.5) using the obvious relation 
blk; n, p) = b(n—k; n, 9). 


(b) Deduce the pinomial distribution both by induction and from the general 
summation formula IV(3.1). 


40. Prove Dkb(k; n, p) = np, and Zk*(k; n, p) = nop? + npg. 
41. Prove Zk’p(k;A) =A? +A. 
42. Verify the identity 


k 
(10.4) 2 blv; ni, p)b(k — v; na, p) = b(k; ni + n, p) 


and interpret it probabilistically. Hint: Use II(6.4). 


Note: Equation (10.4) is a special case of convolutions, to be int i 
chapter XI; (10.5) is another example. ý = aeti à 


43. Verify the identity 


k 
(10.5) D Pv N)plk — v; ħa) = plk; M + ào) 
44. Let 
(10.6) Blk; n, p) = È boin, p) 
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be the probability of at most k suce in n trials. Then 
(10.7) Bk; n + 1, p) = Blk; n, p) — pb(k; n, p), 
B(k+1;n+1, p) = Blk; n, p) + gb(k+1; n, p). 


Verify this (a) from the definition, (b) analytically. 
45. With the same notation 


(10.8) B(k;n, p) = (n — k) G) PEE -ta 
and ; 
(10.9) 1 — Bika, p) = n ("5 °) fea ora, 


Hint: Integrate by parts or differentiate both sides with respect to p. De- 
duce one formula from the other. 

Note: The integral in (10.9) is the incomplete beta function. Tables of 
1 — Bk; n, p) to 7 decimals for k and n up to 50 and p = 0.01, 0.02, 0.03, ... 
are given in K. Pearson, Tables of the incomplete beta function, London (Bio- 
metrika Office), 1934. 

46. Prove 


(10.10) p(0;2) +... pin; A) = 3 f ean de, 


Note: In the following problems we give an upper bound for all the terms of the 
binomial distribution. The calculations are quite simple and the method can be im- 
proved to give the simplest derivation of the DeMoivre-Laplace limit theorem (cf. 
problems VII, 19-21). Put for abbreviation 
k—-(m+1p+} 

{(n + 1)pg}# 


and let m be the index of the central term; that is, m is the integer satisfying (3.2). 


(10.11) i= 


47. Prove that for r > (n + 1)p 
(10.12) b(r; n, p) < b(m; n, p) eirt? + 


where ô= m — (n -+1)p +4 whence |ô| < }. 
Hint: Rewrite (3.1) in the form 


blk;n, p) _ (n+ pg — tk -— (n + 1)p}p 


(10:19) Xk- 1;n p) (n+ Dpat ik (n+ Dplq 
Conclude that for k > (n + 1)p 

bikin P) clog fy k= + DP k= @+)p 
028) oe Eee S (n+ Ipaq SG + pe 


whence the assertion follows by summation. i 

48. For r < (n + 1)p the inequality (10.12) holds with the factor p in the 
exponent replaced by q. Hence, if p is replaced by pg, the inequality holds 
for all r. 


CHAPTER VII 


The Normal 21 an 
to the Binomial Distribution 


1. THE NORMAL DISTRIBUTION 


In order to avoid later interruptions we pause here to introduce two 
functions of great importance. 


Definition. The function defined by 


1 
1.1 z) = e-iz? 
(1.1) olz) (ny! 
is called the normal density function; its integral 
1 z 
1.2 &(x) = — Í w a 
(1.2) (x) emi d_° y 


is the normal distribution function. 


The graph of ¢(x) is the symmetric, bell-shaped curve shown in 
figure 1. Note that different units are used along the two axes: The 
maximum of ¢(z) is (2)? = 0.399, approximately, so that in an ordi- 
nary Cartesian system the curve y = (x) would be much flatter. 


Lemma 1. The domain bounded by the graph of (x) and the x-axis 
has unit area, that is, 


+o 
(1.3) f $(x) dz = 1. 
Proof. We have Äi 


(1.4) | five ar} = line fe) dx dy = 


VILI) 


-3 
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P(x) 


-3 -2 =1 =0.67 0 
50% of area 


68.3% ol area 
95.6% of area 
99.7% of area 


Ficure 1. The normal density function. 
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Ficure 2. The normal distribution function, 
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This double integral can be expressed in polar coordinates thus: 


1 2r Gol J io 
(1.5) =f ao f er dr =f edr dr = —e "| =1 
2r Jo o 0 o 


which proves the assertion. 

It follows from the definition and the lemma that (x) increases 
steadily from 0 to 1. Its graph (figure 2) is an S-shaped curve with 
(1.6) (—z) = 1 — (2). 


Table 1 gives the values ' of (x) for positive z, and from (1.6) we get 
@(—2). 

For many purposes it is convenient to have an elementary estimate 
of the “tail,” 1 — (z), for large z. Such an estimate is given by 


Lemma 2.27 Asz —> © 


(1.7) 1— &(2)~ e; 


(2r)iz 


more precisely, for every z > 0 the double inequality 


(1.8) oo {E — 5) < 1-— (z) < 
T, 


holds (cf. problem 1). 
Proof. By differentiation we may verify that 


1 1 1 n X 
(1.9) ——e—= oul a fı 5) 
as E E pM 
The integrand on the right side is greater than the integrand of 


ih ®© 
1.10 = ee Tes 
(1.10) 1 — &(z) aa e-i? dy, 


1 For larger tables cf. Tables of probability functions, vol. 2, National Bureau of 
Standards, New York, 1942. There ¢(z) and ®(z) — #(—z) are given to 15 deci- 
mals for z from 0 to 1 in steps of 0.0001 and for z > 1 in steps of 0.001. 

2 Here and in the sequel the sign ~ is used to indicate that the ratio of the two 
sides tends to one. 
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TABLE 1 


Tue NORMAL DISTRIBUTION 


~ 
bed 
S 
= 

1 
=> 

S 
= 


0.0 0.398 942 0.500 000 
0.1 «396 952 «539 828 
0.2 :391 043 :579 260 
0.3 381 388 617 911 
0.4 :368 270 

0.5 352 065 .691 462 
0.6 333 225 1725 747 
0.7 312 254 758 036 
0.8 289 692 788 145 
0.9 266 085 :815 940 
1.0 .241 971 841 345 
1.1 217 852 -864 334 
1.2 -194 186 884 

1.3 .171 369 -903 200 
14 .149 727 -919 243 
1.5 129 518 933 193 
1.6 -110 921 .945 201 
iy? 094 049 955 435 
18 078 950 964 070 
1.9 065 616 971 283 
2.0 053 991 977 250 
2.1 043 984 982 136 
2.2 035 475 986 097 
2.3 :028 327 989 276 
2.4 :022 395 991 802 
2.5 017 528 .993 790 
2.6 013 583 995 339 
2.7 010 421 996 533 
2.8 007 915 1997 445 
2.9 005 953 998 134 
3.0 004 432 998 650 
3.1 003 267 .999 032 
3.2 002 384 .999 313 
33 001 723 999 517 
34 001 232 999 663 
3.5 000 873 .999 767 
3.6 000 612 “999 841 
37 000 425 -999 892 
38 000 292 ‘999 928 
39 000 199 999 952 
4.0 000 134 .999 968 
41 000 089 999 979 
4.2 000 059 999 987 
43 000 039 999 991 
44 000 025 999 995 
45 000 016 .999 997 


168 NORMAL APPROXIMATION (VIL. 


which proves the second inequality in (1.8). The first inequality fol- 
lows in the same way, using as new integrand e™™¥ {1 — 3/y*} which 
is smaller than e~»”. 
Note on Terminology. The term distribution function is used in the mathe- 
™ matical literature for never-decreasing functions of z which tend to 0 as 
z — —,andtolasz— œ. Statisticians currently prefer the term cumulative 
distribution function, but the adjective “cumulative” is redundant. A density 
function is a non-negative function f(z) whose integral, extended over the entire 
z-axis, is unity. The integral from —~ to z of any density function is a distribu- 
tion function. The older term frequency function is a synonym for density function. 
The normal distribution function is often called the Gaussian distribution, but 
it was used in probability theory earlier by DeMoivre and Laplace. If the origin 
and the unit of measurement are changed, then #(z) is transformed into @((z — a)/b); 
this function is called the normal distribution function with mean a and variance 


b? (or standard deviation |b|). The function 26(z24) — 1 is often called error 
Junction. 


2. THE DeMOIVRE-LAPLACE LIMIT THEOREM 


Let S, stand for the number of successes in n Bernoulli trials with 
probability p for success. Then b(k;n, p) is the probability of the 
„event that S, = k. In practice we are usually interested in the prob- 
` ability of the event that the number of successes lies between preassigned 
limits æ and $. If a and f are integers and « < 8, then this event is 
defined by the inequality a < S, < £, and its probability is 


(2.1) Pla <S, <B} = bla; n, p) + b(a+1;n, p) +...+ b(B;n, p). 


This sum may involve many terms, and a direct evaluation is usually 
impractical. Fortunately, whenever n is large, the normal distribution 
function can be used to derive simple approximations to the probability 
(2.1). This discovery is due to DeMoivre? and Laplace.4 We shall 
see that its importance goes far beyond the domain of numerical cal- 
culations. 


Our first aim is to derive an asymptotic formula for the individual 
terms 


n! 
2.2) b(k; n, p) = —— p, 
( (k; n, p) a-p”? 
The probability p will be kept fixed, but we shall let n —> œ. Accord- 
ing to the law of large numbers [VI(4.2)], the probability that 
|S, — np| > ne tends to zero for each e > 0, and therefore only values 


» Abraham DeMoivre (1667-1754). His The doctrine of chance appeared in 1718. 


‘Pierre S. Laplace (1749-1827). His Théorie analytique des probabilités appeared 
in 1812. 


VII.2) THE DeMOIVRE-LAPLACE LIMIT THEOREM 169 


of k such that |k — np|n™ — 0 present a problem. Itis now con- 
venient to introduce the new variable 6, = k — np. Then 


(2.3) k=np+h&, n—k=ng— ôr, 
and we are interested only in combinations n, k such that n — © and 
5,/n — 0. 


Expressing the factorials in (2.2) by means of Stirling’s formula 


TI(9.1), we get ® 
| er 
n—k J 


k 
(24) blkm p) ~ 2 6 
- (oe 
2r(np + ôr) (ng — ôr)! (1 + ôr/np)"?t*(1 — 8),/ng)re—e 


2rk(n — k) k 
where the sign ~ indicates that the ratio of the two sides tends to unity. 
To evaluate the last fraction we pass to logarithms. In the interval 
| 5x| < npg we may use Taylor’s expansion II(8.9) and find for the 
logarithm of the denominator 


(2.5) (np + ò) log (1 + ôr/np) + (ng — ôr) log (1 — 64/ng) = 
ôk 5,2 ô 
= cop + 6) (= - Saat aah +...) - 
Ôk òk ae ) 
= = (tra tat: J) 
(ng — ôr) (= AR ante Se rë + 
Reordering the terms according to powers of ôr, we get 
2.6) a(i of *) ae (J + 4 
gens a a 
Ôk p—q & | 
= 1+—-—+H..-f¢- 
2npq ne 3pg n i 


Here the term of 5,2/2npq is dominant, since 5;/n — 0. If we sup- 
pose that 53/n? — 0, then all terms in (2.6) except the first tend to 


s It will be recalled that in chapter II we did not complete the proof of Stirling’s 
formula but showed only that r! ~ Cr'tte—*, where C is a positive constant. In 
the text it is assumed that C = (27)?. If we want to prove this fact, then the fac- 
tor (27)? in equations (2.4), (2.7), and (2.8) must be replaced by C. In this case 
a factor C- (27)? must be inserted on the right sides in (2.11), (2.14), and (2.18). 
To show that this factor really equals 1 it suffices to choose zg and — Za very large. 
The right side in the modified equation (2.18) is then arbitrarily near to C-(2r)~4, 
and the left side is near 1 by the estimates of chapter VI, section 3. 
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zero, and (2.4) takes on the simpler form 

nr 
2x(np + ò+) (ng — ôr) 


However, np + ôr ~ np and ng — & ~ ng, and so (2.7) may be fur- 
ther simplified to 


: Pn re in =) 
pp [mnp ritea 


This is the desired asymptotic formula. We simplify it by the use 
of a more convenient notation. Put 


1 
(npg)? 
and define a function zz of the variable k by 


t 
Cie Uam l } e lanpe, 


(2.9) h= 


ô, 
(2.10) z= (k — np)h = —*_. 
(npa)? 
In terms of these quantities we can rewrite (2.8) in the form 
(2.11) blk; n, p) ~ he(cx). 


To derive this formula we had to suppose that n > œ and k —> œ 
in such a way that d.n—! — 0 and also d,°n-* — 0. The last condi- 
tion obviously implies the first and is the same as zn — 0, We 
have thus 


Theorem 1. Ifn > œandk > 0 in such a way that zn —> 0, 
then (2.11) holds. More precisely, we have shown that there exist two 
constants A and B such that Z 

blk; m p) i | cA, Blatt 
hates) it i 
(For an alternative for (2.11) see problems 19 and 21.) 


=—se~ 


(2.12) 


ni 


Figure 3 illustrates the theorem in the case n = 10, p = 0.2 where 
npg is only 1.6. It is seen that even in this extremely unfavorable case 
the approximation is surprisingly good. : 


° The values of b(k; 10, 0.2) for k = 0, 1, ..., 6 are 0.1074, 0.2684, 0.3020, 0.2013, 
0.0880, 0.0264, 0.0055. The Corresponding approximations hg(zz) are 0.0904, 
0.2307, 0.3154, 0.2307, 0.0904, 0.0189, 0.0021. 
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Sn 


0 1 2 3 4 5 6 


Fiaure 3. The normal approximation to the binomial distribution. The step 
function gives the probabilities b(k; 10, $) of k successes in ten Bernoulli trials with 
p = 4. The continuous curve gives for each integer k the corresponding normal 
approximation. 


Our theorem leads directly to simple approximations for the sum 
(2.1). If 


(2.18) hr +0 and kz > 0, 

then (2.11) holds uniformly for all terms in (2.1), and therefore 

(2.14) Pfa < Sn <8} ~h{o(wa) + ati) +... + $(%p)}. 

The right side is a Riemann sum approximating an integral’ and we 
proceed to investigate the goodness of this approximation. 


By the mean value theorem there exists a value x such that 


(2.15) Blar) — Ptr) = hol), te — Bh <Er < te + Fh. 


7It is clear that (zz) — #(zx-3) represents the area of the trapezoid with 
basis z — 3h < z < Tk + 3h and bounded above by the tangent to the curve 
y = ¢(z) at z = z and ho(z:) represents the area of a rectangle with the same basis. 
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Then 
(2.16) helar) = b=) (Sfery) — Bler)}- 


Choose an arbitrary «e > 0. If (2.13) holds, then for all æ <k<B 
and n sufficiently large 


FIE — zel = Ste — zel- lër + al< Al|ze| + Zh] < e 

and hence 
(2.17) e*{B(ee44) — B(zp_4)} < holz) < e*{ O(c, 44) — D(xz_4)}. 
Adding over k, we see that the ratio of the right-hand member in 
(2.14) to ®(za44) — S(Ta—;) tends to one. We have thus proved the 

DeMoivre-Laplace Limit Theorem. If a and B vary so that 
hq? — 0 and he — 0, then 
(2.18) Pla < Sa <6} ~ (zg) — Bta), 


where h = (npg) ™ and Tı = (t — np)h. In words, the percentage dif- 
ference between the two sides in (2.18) tends to zero together with hxg? 
and ha,3. 


are large. In such cases both sides of (2.18) are small, and it becomes 


important to know that their ratio is near unity as well as that their 
difference tends to zero. 


The limit theorem (2.18) takes on a simpler form if, instead of Si, 
we introduce the reduced number of successes defined by 
Sn — np 
(npg)? Á 
This amounts to measuring the deviations of S, from np in units of 
(npq)!. The quantity np is called the mean, and (npg)! the standard 


deviation of S,; this terminology is suggested by the theory of random 
variables (cf. chapter IX). The inequality a < S, < £ is the same as 


(2.19) Se 
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Za < Sn* < zp, and (2.18) states that for arbitrary fixed Te < Tg 


(2.20) Pita < Sa* < x} ~ (2 aa `) = a (zx = I 


where h = (npg) ™. Now h — 0 as n — œ, and therefore the right 
side tends to &(zs) — (Ta). Thus we have the following 


j Corollary to the Limit Theorem. For every fixeda < b 
(2.21) P{a <S,* <b} — (b) — (a). A 


This is a weakened version of (2.18) but represents the traditional 
form of Laplace’s limit theorem. The dropping of h/2 in (2.20) intro- 
duces an error which tends to zero as n — œ but has a considerable 
influence when npg is of moderate magnitude [as is the case in the 

| three examples 3(a)-(c)]. 

The main fact revealed by (2.21) is that for large n the probability 
on the left is practically independent of p. This permits us to compare 
fluctuations in different series of Bernoulli trials simply by referring to 
our standard units. 


Theorem (2.21) is historically the first limit theorem of probability, From a 
modern point of view it is only an exceedingly special case of the central limit 
theorem, to which we shall return in chapter X but whose general derivation must 
be postponed to the second volume. Statisticians use (2.21) as an approximation 
even where npg is relatively small, and in such cases an estimate of the error is de- 
sired. It turns out that in most cases the error in (2.11) is small as compared to 
the error committed by replacing the sum in (2.14) by the integral. (Fortunately 
this error can be avoided by the use of the Euler-MacLaurin summation formula.) 
Serge Bernstein devoted a series of papers to the investigation of the error term in 
the general case and discussed how the definition of x; should be modified in order 
to improve the convergence in (2.18). His papers are written in Russian and are 
difficult to obtain. A simplified derivation with an improvement of his results is, 
however, available in English.* 


Note on Optional Stopping 

Tt is essential to note that our limit and approximation theorems are 
valid only if the number n of trials is fixed in advance independently of 
the outcome of the trials. If a gambler has the privilege of stopping at 
á moment favorable to him, his ultimate gain cannot be judged from 
the normal approximation, for now the duration of the game depends 
on chance. For every fixed n it is very improbable that S,* is large. 
However, in the long run, even the most improbable thing is bound to 


£W., Feller, On the normal approximation to the binomial distribution, Annals 
of Mathematical Statistics, vol. 16 (1945), pp. 319-829. 
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happen, and we shall see that in a continued game S,* is practically 
certain to have a sequence of maxima of the order of magnitude (log 
log n)? (this is the law of the iterated logarithm of chapter VIII, sec- 
tion 5). 

3. EXAMPLES 

(a) Let p = 3,n = 200, « = 95,8 = 105. Here P{95 <S, < 105} 
may be interpreted as the probability that in 200 tossings of a coin 
the number of heads deviates from 100 by at most 5. We have 
h = (50) = 0.141421... and —Ta—} = 244 = (5.5)h = 0.7778.... 
From tables we get (z844) — }(ze_3) = 0.56331.... The true value 
(again obtainable from tables) is 0.56325.... The error is ridiculously 
small, but only because of the accident that in the interval in question 
the integral overestimates the sum in (2.14) and the approximation 
(2.11) underestimates each term. 

(b) Let p = yy, n = 500, a = 50, 8 = 55. The correct value is 
P{50 <S, < 55} = 0.317573.... Now h = (45)—4 = 0.1490712..., 
and we get the approximation ©(5.5h) — (—0.5h) = 0.38235.... The 
error is about 2 per cent. 

(c) Let n = 100, p = 0.3. Table 2 shows in a typical example (for 
relatively small n) how the normal approximation deteriorates as the 
interval (æ, 8) moves away from the central term. 


TABLE 2 


COMPARISON OF THE BINOMIAL DISTRIBUTION FOR n = 100, p = 0.3 
AND THE NORMAL APPROXIMATION 


Number of Normal Ap- Percent- 

Successes Probability proximation age Error 
9<S,<11 0.000°006 0.000 03 +400 
12 < S„ < 14 .000 15 -000 33 +100 
15<S,<17 -002 01 .002 83 +40 
18 < S„ < 20 -014 30 .015 99 +12 
21 < S„ < 23 .059 07 .058 95 0 
24 < S„ < 26 -148 87 .144 47 -3 
27 < Sa < 29 .237 94 .234 05 =2 
31<5, < 33 -230 13 -234 05 +2 
34<S, < 36 .140 86 .144 47 +3 
37 < S, < 39 .058 89 058 95 0 
40 < Sn < 42 .017 02 .015 99 —6 
48<8, < 45 .003 43 .002 83 =18 
46 < Sn < 48 .000 49 -000 33 —33 
49 < S, < 51 .000 05 -000 03 —40 


SS) 
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(d) Let us find a number a such that, for large n, the inequality 
|S.*|> a has a probability near 3. For this it is necessary that 
(a) — &(—a) = $ or (a) = 2. From tables of the normal distri- 
bution we find that a = 0.6745, and hence the two inequalities 


(3.1) |S, — np| < 0.6745(npg)! and |S. — np| > 0.6745 (npg)? 


are about equally probable. In particular, the probability is about 4 
that in n tossings of a coin the number of heads lies within the limits 
n/2 + 0.3377}, and, similarly, that in z throws of a die the number of 
aces lies within the interval n/6 + 0.251n}. The probability of S, lying 
within the limits np + 2(npg)? is about (2) — ®(—2) = 0,9545..., 
and for np + 3(npq)! the probability is 0.9973... 

(e) A competition problem. This example illustrates practical appli- 
cations of formula (2.21). Two competing railroads operate one train 
each between Chicago and Los Angeles; the two trains leave and arrive 
simultaneously and have comparable equipment. We suppose that n 
passengers select trains independently and at random so that the num- 
ber of passengers in each train is the outcome of n Bernoulli trials with 
p= 3. Ifa train carries s < n seats, then there is a positive proba- 
bility f(s) that more than s passengers will turn up, in which case not 
all patrons can be accommodated. Using the approximation (2.21), 
we find 
(3.2) j@a1— a(2 ; *). 


n 


Tf s is so large that J(s) < 0.01, then the number of seats will be suffi- 
cient in 99 out of 100 cases. More generally, the company may decide 
on an arbitrary risk level æ and determine s so that f(s) < æ. For that 
purpose it suffices to put ; 


(3.8) s> z(n + tani), 


where tą is the root of the equation a = 1 — &(t,), which can be found 
from tables. For example, if n = 1000 and a = 0.01, then ta = 2.33 
and s = 537 seats should suffice. If both railroads accept the risk 
level a = 0.01, the two trains will carry a total of 1074 seats of which 
74 will be empty. The loss from competition (or chance fluctuations) 
is remarkably small. In the same way, 514 seats should suffice in 
about 80 per cent of all cases, and 549 seats in 999 out of 1000 cases. 

Similar considerations apply in other competitive supply problems. 
For example, if m movies compete for the same n patrons, each movie 
will put for its probability of success p = 1/m, and (8.3) is to be re- 
Placed by s > (1/m)[n + tani(m — 1)!]. The total number of empty 


176 NORMAL APPROXIMATION [VII.3 


seats under this system is ms — n ~ tanì(m — 1). For a = 0.01, 
n = 1000, and m = 2, 3, 4, this number is about 74, 126, and 147, 
respectively. The lass of efficiency because of competition is again 
small. 

(f) Random digits. In example II(3.b), we considered an event with 
p = 0.3024. Inn = 1200 trials this event had an average frequency of 
0.3142. The deviation from p is e = 0.0118. In this case (pg)? = 0.4593 


and ¢(n/pq)! ~ 0.890.... Hence the probability of |e: —p | > eis 


in this case about 0.37.... This indicates that in about 37 per cent 
of all cases the average number of successes should deviate from p by 
more than it does in our material. 

(g) Sampling. A fraction p of a certain population are smokers. 
Suppose that p is unknown and that random sampling with replace- 
ment is to be used to determine p. It is desired to find p with an error 
not exceeding 0.005. How large should the sample size n be? If p’ is 
the fraction of smokers in the sample, we desire that |p’ — p| < 0.005. 
However, no sample size can give absolute assurance that |p’ — p| < 
0.005; it is conceivable that the sample contains only smokers. 
Since absolute certainty is unattainable, we settle for an arbitrary 
confidence level a, say, a = 0.95, and require that |p’ — p| < 0.005 
with probability 0.95 or better. Note that np’ is the number of suc- 
cesses in n trials, and hence 


P{|p' — p| < 0.005} = P {|= = >| < 0.005} 

We seek an n large enough to make this quantity greater than 0.95. 
For the present purposes the ncrmal approximation is sufficient. The 
root x of (x) — &(—z) = 0.95 is x = 1.96..., and hence we should 
have 0.005(n/pq)! > 1.96. Thus we are led to the inequality n > 392?pq 
or n > 160,000pq, approximately. It involves the unknown p, but pq 
never exceeds 4, and hence the sample size n = 40,000 would be safe 
under all circumstances; with it the odds are about 20 to 1 that 
|p’ — p| < 0.005. 


4. RELATION TO THE POISSON APPROXIMATION 


The error of the normal approximation will be small if npg is large. 
On the other hand, if n is large and p small, the terms b(k; n, p) will be 
found to be near the Poisson probabilities p(k; A) with à = np. If » 
is small, then only the Poisson approximation can be used. However, 
if \ is large, we can use either the normal or the Poisson approximation. 
This implies that for large values of à it must be possible to approxi- 


—— 
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mate the Poisson distribution by the normal distribution, and in exam- 
ple X(1.c) we shall see that this is indeed so (cf. also problem 9). 
Here we shall be content to illustrate the point by a numerical and a 
practical example. 


Examples. (a) Consider the Poisson distribution p(k; 100) as an 
approximation, say, to the binomial distribution with n = 100,000,000 


© and p = 1/1,000,000. Then npg ~ 100; this quantity, even though 


not large, suffices for the normal distribution to give reasonable ap- 
proximations at least for the central sector of the binomial distribution. 
The Poisson distribution p(k; 100) agrees with b(k; 108, 10°) to many 
decimals, and we can compare it with the normal approximation to the 
latter. Put, for brevity, P(a, b) = p(a; 100) + p(a+1; 100) +...+ 
+ p(b; 100), so that P(a, b) stands for Pfa <S, < b} and should be 


approximated by oe ae se). The following 


sample gives an idea of the degree of approximation. 


Correct Values Normal Approximation 


P(85, 90) 0.113 84 0.110 49 
P(90, 95) 184 85 179 50 
P(95, 105) 417 63 417 68 
P(90, 110) .706 52 .706 28 
P(110, 115) 107 38 110 49 
P(115, 120) 053 23 053 35 


(b) A telephone trunking problem. The following problem is, with 
some simplifications, taken from actual practice.’ A telephone ex- 
change A is to serve 2000 subscribers in a nearby exchange B. It 
would be too expensive and extravagant to install 2000 trunklines from 
A to B. It will suffice to make the number N of lines so large that, 
under ordinary conditions, only one out of every hundred calls will 
fail to find an idle trunkline immediately at its disposal. Suppose that 
during the busy hour of the day each subscriber requires a trunkline 
to B for an average of 2 minutes. At a fixed moment of the busy hour 
we compare the situation to a set of 2000 trials with a probability 
P = y% in each that a line will be required. Under ordinary condi- 
tions these trials can be assumed to be independent (although this is 
not true when events like unexpected showers or earthquakes cause 


°’ E. C. Molina, Probability in engineering, Electrical Engineering, vol. 54 (1935), 
pp. 423-427, or Bell Telephone System Technical Publications Monograph B-854. 
There the problem is treated by the Poisson method given in the text, which is 
preferable from the engineer's point of view. 
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many people to call for taxicabs or the local newspaper; the theory no 
longer applies, and the trunks will be “jammed”). We have, then, 
2000 Bernoulli trials with p = zy, and the smallest number W is re- 
quired such that the probability of more than N “successes” will be 
smaller than 0.01; in symbols P {S2000 = N} < 0.01. 

For the Poisson approximation we should take à = 2$° = 66.67. 
From the tables we find that the probability of 87 or more successes 
is about 0.0097, whereas the probability of 86 or more successes is 
about 0.013. This would indicate that 87 trunklines should suffice. 
For the normal approximation we first find from tables the root x 
of 1 — &(z) = 0.01, which is z = 2.327. Then it is required that 
(N — 4 — np)/(npq)! > 2.327. Since n = 2000, p = yy, this means 
N > 67.17 + (2.327)(8.027) ~ 85.8. Hence the normal approxima- 
tion would indicate that 86 trunklines should suffice. 

For practical purposes the two solutions agree. They yield further 
practical results. Conceivably, the installation might be cheaper if 
the 2000 subscribers were divided into two groups of 1000 each, and 
two separate groups of trunklines from A to B were installed. Using 
the method above, we find that actually some ten additional trunklines 
would be required so that the first arrangement is preferable. 


5. LARGE DEVIATIONS ” 


Frequently we desire an estimate of the probability that the reduced 
number of successes S,,* [cf. (2,19)] exceeds a given number z. Hence 
the upper limit of the interval is infinity, and it requires a special 
argument to show that our limit theorem (2.18) still applies. 


Theorem. If n — œ and z varies as a function of n in such a way 
that z — © but 28h — 0, then 


(5.1) P{S,* > 2} ~1— &z). 
In view of (1.7) this is equivalent to 


(5.2) PS," > 2} ~ 


rjr 


Proof. Choose in (2.18) the integers a and £ so that z lies between 
Za and Taşı, and that zg ~ 2-++logz. Then zgh — O and (2.18) 
holds. Hence 


(5.3) Pla < Sn <B} ~ {1 — &(z,)} — {1 — B(zp)}. 


10 The theorem is of general interest but will be used in this book only for the 
proof of the law of the iterated logarithm, chapter VIII, section 5. 


——— ee 
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However, from (1.7) and the fact that Xp ~= Ta + log Ta it is readily 
seen that 1 — &(zg) is of smaller order of magnitude than 1 — (za), 
while 1 — (za) ~ 1 — &(z). Hence 


(5.4) Pla <S, <6} ~1— (z). 
On the other hand, from (2.11) and VI(3.5) we have 
n 
B — np 
Now nh? = 1/pq is a constant, and 


(5.5) P{S, > 8} < 


nh? 
b(B; n, p) ~ — (zz). 
rp 


1 
(5.6) — $(zg) ~ 1 — (zp). 
Xp 


We saw that the right side tends to zero faster than 1 — (x), which 
means that P{S, > £} is of smaller order of magnitude than 1 — (x). 


Combining this result with (5.4), we see then that 


(5.7) P{S, > a} ~1 — &(2), 
and this is our theorem. (Further limit theorems for large deviations 
are given in problems 12-17.) 
6. PROBLEMS FOR SOLUTION 
1. Generalizing (1.8), prove that 


(6.1) 1 = 8a) ~ Be {2-5423 1385 


fi heat 


+p ED] 


and that for z > 0 the right side overestimates 1 — ®(z) if kis even, and under- 
estimates if k is odd. 
2. For every constant a > 0 


(6.2) {1- z+$)} + (1-2) en 


asz — o. 

8. Find the probability that among 10,000 random digits the digit 7 appears 
not more than 968 times. 

4. Find an approximation to the probability that the number of aces ob- 
tained in 12,000 rollings of a die is between 1900 and 2150. 

5. Find a number & such that the probability is about 0.5 that the number 
of heads obtained in 1000 tossings of a coin will be between 440 and k. 

6. A sample is taken in order to find the fraction f of females in a popula- 
tion. Find a sample size such that the probability of a sampling error less 
than 0.005 will be 0.99 or greater. 


g oho 2 a 
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7. In 10,000 tossings, a coin fell heads 5400 times. Is it reasonable to assume 
that the coin is skew? 


8. Find an approximation to the maximal term of the trinomial distribution 


Te mn ed — 5 pots 
TES eA oa 

9. Normal approximation to the Poisson distribution. Using Stirling’s for- 
mula, show that, if à — ©, then for every fixed a < B 
(6.3) Dy p(k;d) — #8) — $a). 


Atard cket 


10. Normal approximation to the hypergeometric distribution. Letn, m, k be 
positive integers and suppose that 

a ARAE M 

n+m 7 n+m P nfm 


where 1/h = {(n + m)pgt(1 — #)}#. Prove that 


G)G- 
k/ \r — W 
(6.5 (" TES ~ ho(z). 
a) 
Hint: Use the normal approximation to the binomial distribution rather 
than Stirling’s formula. 


11. Normal distribution and combinatorial runs.!! In II(11.19) we found that 


in an arrangement of n alphas and m betas the probability of having exactly 
k runs of alphas is 


ae n—1 m+1 a n+ m 
69) me FO k )+( n ): 
Letn — ©, m — © so that (6.4) holds. For fixed a < £ the probability that 


the number of alpha runs lies between npg + a(pgn)? and npg + È ten 
UO eg qn) pa + B(pqn)* tends 


Note: In the following problems h? = npg and Sn* is the reduced number of successes 
defined in (2.19). Finally 


(6.7) F(z) = P{S,* > z). 


12. If z varies as a function of n so that z*+*h — 0 but x — », then !! 


(6.4) >q, h{k-—rp} >z 


u A, Wald and J. Wolfowitz, On a test whether two samples are from the same 
population, Annals of Mathematical Statistics, vol. 11 (1940), pp. 147-162. For 
more general results, see A. M. Mood, The distribution theory of runs, ibid., pp. 
367-392. 

u N. Smirnov, Über Wahrscheinlichkeiten grosser Abweichungen (in Russian, 
Cerman summary), Recueil Mathématique [Sbornik] Moscou, vol. 40 (1933), pp- 
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Fa) _ 
(68) gT + oe"), 
where o(z°) stands for terms that are of smaller order of magnitude than 2°. 
13. If 22h —> 0, z — ©, then '* for any constant a > 0 F 
F,(z) — F(z + a/z) 
(6.9) < RO <a 


In words, the conditional probability of z < S„* < z + ajz, ‘given that 
S,* > 2, tends to 1 — e~*. [Hint: Use (5.2).] 

14, Probabilities of large deviations. Starting with (2.4), prove the following 
theorem. If 2 — , and k varies so that (k — np)/n —> 0, then 


ma C 


(6.10) blk; n, p) ~ gao 
where z = (k — np)h and 

Pst (Sg a 
(6.11) I) = 2 Pap, 


Note: If z*h — 0, then f(z) — 0, and (6.10) reduces to (2.11). If xis of the 
order of magnitude of h—! but negligible as compared to h-}, then 


(6.12). : fe) = P. 7 q 2h. 
If z is of the order of magnitude of h~, then 
(6.13) Ja) = pt zh + ete ah, 
ete. a 

15. Continuation. Prove that if z — œ, zh —> 0, 

a 

(6.14) (2+2) - sf) +0 
and hence 
(6.15) ` F(z) ~ &@{1 — B(z)}. 


16. Deduce (6.9) from (6.15), assuming only zh — 0. 
17. If p > q, then for large z 
(6.16) P{S, > z} <P{Sa < —z}. 
(Hint: Use problem 14.) 


18. A new derivation of the law of large numbers. Show that the law of large 
numbers is a consequence of the DeMoivre-Laplace limit theorem. 


1 A, Khintchine, Uber einen neuen Grenzwertsatz der Wahrscheinlichkeitsrech- 
nung, Mathematische Annalen, vol. 101 (1929), pp. 745-752. See also problem 16. 
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. 19. A’new derivation of the normal approximation. Starting from VI(10.11)- 
(10.13), prove that when n > © and k — œ in such a way that gên > 0, 
we have k 


“Cais b(k; n, p) ~ Bm; n, pew SE” 
` 20. If np <m < (n+1)p, show that 


618)“ bi A A (rin, zti) < b(m;n, p) S b (min, z) ` 


If (n+lp-1<m < np, the same inequality holds with (m + 1)/(n + 1) 
in the extreme left member replaced by m/(n + 1). 


21. Conclude that b(m; n, p) ~ {2r(n + 1)pa} — and give upper and lower 
bounds. 


u Problems 19 and 20 together imply that b(k; n, p) ~ {(n + 1)pq} tolt). This 
is the same as the basic approximation formula (2.11) with z+ replaced by & and 
h = {npq}— replaced by h’ = {n+ 1)pq}—*. Since tk ~ & and h ~ h', the two 
formulas are asymptotically equivalent. Actually the new formula involves a 
smaller error term (in its derivation the error committed in passing from (2.7) to 
(2.8) is avoided). It should also be noted that the calculations required for problems 
19 and 20 are simpler and moré intuitive than those used in the text; they involve 
only the standard estimates for logarithms as used in chapter II, section 8, and 
chapter VI, section 3. In short, the new formula and its derivation are superior 


to those of the text, but they do not conform to the time-honored use of np instead 
of (n + 1p. 


CHAPTER VIII* 


Unlimited Sequences 
of Bernoulli Trials 


This chapter discusses certain properties of randomness and the im- 
portant law of the iterated logarithm for Bernoulli trials. -A different 
aspect of the fluctuation theory of Bernoulli trials (at least for p = 4) 
is covered in chapter III. 


1. INFINITE SEQUENCES OF TRIALS 


In the preceding chapter we have dealt with probabilities connected 
with n Bernoulli trials and have studied their asymptotic behavior as 
n — ©, We turn now to a more general type of problem where the 
events themselves cannot be defined in a finite sample space. 


Example. A problem in runs. Let «æ and 8 be positive integers, and 
consider a potentially unlimited sequence of Bernoulli trials, such as 
tossing a coin or throwing dice. Suppose that Paul bets Peter that a 
Tun of œ consecutive successes will occur before a run of 8 consecutive 
failures. It has an intuitive meaning to speak of the event that Paul 
wins, but it must be remembered that in the mathematical theory the 
term event stands for “aggregate of sample points” and is meaningless 
unless an appropriate sample space has been defined. The model of a 
finite number of trials is insufficient for our present purpose, but the 
difficulty is solved by a simple passage to the limit. In n trials Peter 
wins or loses, or the game remains undecided. Let the corresponding 
probabilities be £n, Yn, Zn (nx + Yn + Zn = 1). As the number n of 
trials increases, the probability Z» of a tie can only decrease, and both 
Zn and yn necessarily increase. Hence z = lim z, y = lim yn, and 
z = lim z, exist. Nobody would hesitate to call them the probabilities 
of Peter’s ultimate gain or loss or of a tie. However, the corresponding 


* This chapter is not directly connected with the material covered in subsequent 


chapters and may be omitted at first reading. 
183 


184 UNLIMITED BERNOULLI TRIALS (VIIL.1 


three events are defined only in the sample space of infinite sequences 
of trials, and this space is not discrete. 


The example was introduced for illustration only, and the numerical values of 
Zn, Yn, Zn are not our immediate concern. We shall return to their calculation in 
example XIII(8.b). The limits z, y, z may be obtained by a simpler method which 
is applicable to more general cases. We indicate it here because of its importance 
and intrinsic interest. 

Let A be the event that a run of a consecutive successes occurs before a run of B 
consecutive failures. Then A means Paul’s winning andz=P{A}. If u and v 
are the conditional probabilities of A under the hypotheses, respectively, that the 
first trial results in success or failure, then z = pu + qv [see V(1.8)]. Suppose 
first that the first trial results in success. In this case the event A can occur in 
a mutually exclusive ways: (1) The following a — 1 trials result in successes; the 
probability for this is p*—!, (2) The first failure occurs at the »th trial where 
2<»<a. Let this event be H, Then P{H,} = p’~*g, and P{A|H,} = v. 
Hence (using once more the formula for compound probabilities) 


a1) u = pe + ol +p +H.. pT) = p + oll — pa), 
If the first trial results in failure, a similar argument leads to 
(1.2) v = pul +g +... + P7) =u(l —¢), 


We have thus two equations for the two unknowns u and v and find for z = pu +w 


= IZE 
a8) ee 


To obtain y we have only to interchange p and q; and æ and $£. Thus 


Ss 1 = pF 
(1.4) yag CaTa ai 
Since z + y = 1, we have z = 0; the probability of a tie is zero. 
For example, in tossing a coin (p = 4) the probability that a run of two heads 
appears before a run of three tails is 0.7; for two consecutive heads before four con- 
secutive tails the probability is $, for three consecutive heads before four con- 


secutive tails }$. In rolling dice there is probability 0.1753 that two consecutive 
aces will appear before five consecutive non-aces, ete. 


In the present volume we are confined to the theory of discrete 
sample spaces, and this means a considerable loss of mathematical ele- 
gance. The general theory considérs n Bernoulli trials only as the 
beginning of an infinite sequence of trials. A sample point is then 
represented by an infinite sequence of letters S and F, and the sample 
space is the aggregate of all such sequences. A finite sequence, like 
SSFS, stands for the aggregate of all points with this beginning, that 
is, for the compound event that in an infinite sequence of trials the first 
four result in S, S, F, S, respectively. In the infinite sample space the 
game of our example can be interpreted without a limiting process. 
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Take any point, that is, a sequence SSFSFF .... In it a run of a 
consecutive S’s may or may not occur. If it does, it may or may not 
be preceded by a run of 8 consecutive F’s. In this way we get a classi- 
fication of all sample points into three classes, representing the events 
“Peter wins,” “Peter loses,” “no decision.” Their probabilities are 
the numbers x, y, z, computed above. The only trouble with this 
sample space is that it is not discrete, and we have not yet defined 
probabilities in general sample spaces. 

Note that we are discussing a question of terminology rather than a 
genuine difficulty. In our example there was no question about the 
proper definition or interpretation of the number z. The trouble is 
only that for consistency we must either decide to refer to the number 
x as “the limit of the probability z, that Peter wins in n trials” or else 
talk of the event “that Peter wins,” which means referring to a non- 
discrete sample space. We propose to do both. For simplicity of 
language we shall refer to events even when they are defined in the 
infinite sample space; for precision, the theorems will also be formu- 
lated in terms of finite sample spaces and passages to the limit.. The 
events to be studied in this chapter share the following salient feature 
of our example. The event “Peter wins,” although defined in an 
infinite space, is the union of the events “Peter wins at the nth trial” 
(n = 1, 2, ...), each of which depends only on a finite number of 
trials. The required probability z is the limit of a monotonic sequence 
of probabilities x, which depend only on finitely many trials. We re- 
quire no theory going beyond the model of n Bernoulli trials; we merely 
take the liberty of simplifying clumsy expressions! by calling certain 
numbers probabilities instead of using the term “limits of probabilities.” 


2. SYSTEMS OF GAMBLING 


The painful experience of many gamblers has taught us the lesson 
that no system of betting is successful in improving the gambler’s 
chances. If the theory of probability is true to life, this experience 
must correspond to a provable statement. 

For orientation let us consider a potentially unlimited sequence of 
Bernoulli trials and suppose that at each trial the bettor has the free 
choice of whether or not to bet. A “system” consists in fixed rules 


1 For the reader familiar with general measure theory the situation may be de- 
scribed as follows. We consider only events which either depend on a finite number 
of trials or are limits of monotonic sequences of such events. We calculate the ob- 
vious limits of probabilities and clearly require no measure theory for that purpose. 
However, only general measure theory shows that our limits are independent of 
the particular passage to the limit and are completely additive. 
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selecting those trials on which the player is to bet. For example, the 
bettor may make up his mind to bet at every seventh trial or to wait 
as long as necessary for seven heads to occur between two bets. He 
may bet only following a head run of length 13, or bet for the first 
time after the first head, for the second time after the first run of two 
consecutive heads, and generally, for the kth time, just after k heads 
have appeared in succession. In the latter case he would bet less and 
less frequently. We need not consider the stakes at the individual 
trials; we want to show that no “system” changes the bettor’s situa- 
tion and that he can achieve the same result by betting every time. 
It goes without saying that this statement can be proved only for sys- 
tems in the ordinary meaning where the bettor does not know the 
future (the existence or non-existence of genuine prescience is not our 
concern). It must also be admitted that the rule “go home after losing 
three times” does change the situation, but we shall rule out such unin- 
teresting systems. 

We define a system as a set of fixed rules which for every trial uniquely 
determines whether or not the bettor is to bet; at the kth trial the decision 
may depend on the outcomes of the first k — 1 trials, but not on the outcome 
of trials number k, k+1, k+-2, ...; finally the rules must be such as to 
ensure an indefinite continuation of the game. Since the set of rules is 
fixed, the event “in n trials the bettor bets more than 7 times” is well 
defined and its probability calculable. The last condition requires that 
for every r, as n — ©, this probability tends to 1. 

We now formulate our fundamental theorem to the effect that under 
any system the successive bets form a sequence of Bernoulli trials with 
unchanged probability for success. With an appropriate change of 
phrasing this theorem holds for all kinds of independent trials; the 
successive bets form in each case an exact replica of the original trials, 
so that no system can affect the bettor’s fortunes. The importance of 
this statement was first recognized by von Mises, who introduced the 
impossibility of a successful gambling system as a fundamental axiom. 
The present formulation and proof follow Doob.? For simplicity we 
assume that p = $. 

Let Az be the event “first bet occurs at the kth trial.” Our defini- 
tion of system requires that as n — œ the probability tends to one 
that the first bet has occurred before the nth trial. This means that 
P{Ai} + P{A2} +...+ P{An} > 1, or 


(2.1) =P{A;} = 1. 


2 J, L. Doob, Note on probability, Annals of Mathematics, vol. 37 (1936), pp. 
363-367. 
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Next, let B be the event “head at kth trial.’ Then the event B 
“when first bet is made the trial results in heads” is the union of the 
events A1B;, A2B2, AsBs, --- which are mutually exclusive. Now 
A, depends only on the outcome of the first k — 1 trials, and By 
only on the trial number k. Hence A; and Bp are independent and 
P{A,Br} = P{Ax}P{Bi} = 1p{A,}. Thus P{B} = 2SP{A,Br} = 
= $2P{Ax} = $. This shows that under this system the probability 
of heads at the first bet is 4, and the same statement holds for all 
subsequent bets. b 

Tt remains to show that the bets are stochastically independent. 
This means that the probability that the coin falls heads at both the 
first and the second bet should be $ (and similarly for all other com- 
binations and for the subsequent trials). To verify this statement let 
A,* be the event that the second bet occurs at the kth trial. Let Æ 
represent the event “heads at the first two bets”; it is the union of all 
events A;B;Ax*Br where j < k (if j 2 k, then A; and A,* are mutually 
exclusive and A;Ax* = 0). Therefore 


(2.2) P{E} = DD P{4;B;Ar"B:)}. 

© jelkeæj+l 
As before, we see that for fixed j and k > j, the event By (heads at 
kth trial) is independent of the event A;B;Ax* (which depends only on 
the outcomes of the first k — 1 trials). Hence 


(2.3) P(E} = 3 y2 7 P{A;B;Ax*} = 


jal k=j+1 


o ©% 
=F LPB) È P{A;*|4;B;} 
jel kæj+1 

lef. V(1.8)]. Now, whenever ‘the first bet occurs and whatever its out- 
come, the game is sure to continue, that is, the second bet occurs sooner 
or later. This means that for given A;B; with P{A;B;} > 0 the con- 
ditional probabilities that the second bet occurs at the kth trial must 
add to unity. The second series in (2.3) is therefore unity, and we 
have already seen that =P{A;B;} = 3. Hence P{E} = } as con- 
tended. A similar argument holds for any combination of trials. 


Note that the situation is different when the player is permitted to 
vary arbitrarily the amounts which he puts down. With systems de- 
pending on the accumulated gain, there exist advantageous strategies, 
and the game depends on the strategy. We shall return to this point 


in chapter XIV, section 2. 
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3. THE BOREL-CANTELLI LEMMAS 


Two simple lemmas concerning infinite sequences of trials are used 
so frequently that they deserve special attention. We formulate them 
for Bernoulli trials, but they apply to more general cases. 

We refer again to an infinite sequence of Bernoulli trials. Let Ay, 
Az, ... be an infinite sequence of events each of which depends only 
on a finite number of trials; in other words, we suppose that there 
exists an integer 7; such that A; is an event in the sample space of 
the first ng Bernoulli trials. Put 


(8.1) a, = P{A;}. 


(For example, A; may be the event that the 2kth trial concludes a run 
of at least k consecutive successes. Then ny = 2k and ap = p*.) 

For every infinite sequence of letters S and F it is possible to estab- 
lish whether it belongs to 0, 1, 2, ... or infinitely many among the 
{Az}. This means that we can speak of the event U,, that an unend- 
ing sequence of trials produces more than r among the events {Az}, 
and also of the event Uw, that infinitely many among the {Az} occur. 
The event U, is defined only in the infinite sample space, and its prob- 
ability is the limit of P(U,,-}, the probability that n trials produce 
more than r among the events {Az}. Finally, P{U..} = lim P{U,}; 
this limit exists since P{U,} decreases as r increases, 


Lemma 1. If Za, converges, then with probability one only finitely 
many events Ar occur. More precisely, it is claimed that for r sufficiently 
large, P{U,} < €or: to every e > 0 it is possible to find an integer r such 
that the probability that n trials produce one or more among the events 
A,41, Ar4a, ... is less than e for all n. s 


Proof. Determine r so that a,4; + ar42 +... < €; this is possible 
since Za, converges. Without loss of generality we may suppose that 
the A; are ordered in such a way that nı Sng L m <.... Let N be 
the last subscript for which ny <n. Then A, ..., Ay are defined in 
the space of n trials, and the lemma asserts that the probability that 
one or more among the events Ar41, Aro, ..., Aw occur is less than e. 
This is true, since by the fundamental inequality 1(7.6) we have. 


(3.2) P{Argi U Argo U... U Aw} < argi +Gr4e+...t4v <6, 
as contended, 


A satisfactory converse to the lemma is known only for the special 
case of mutually independent A+. This situation occurs when the trials 
are divided into non-overlapping blocks and A; depends only on the 
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trials in the kth block (for example, A; may be the event that the kth 
thousand of trials produces more than 600 successes). 


Lemma 2. If the events A; are mutually independent, and if Za, 
diverges, then with probability one infinitely many Aj, occur. In other 
words, it is claimed that for every r the probability that n trials pro- 
duce more than r among the events {A;} tends to 1 asn — œ. 


Proof. As in the proof of lemma 1 let Aj, As, ..., Aw be the 
events defined in the sample space of n trials. -The probability 
that none of them occurs is, because of the assumed independence, 
(1 — ay)(1 — ag) --+ (1 — ay). Now1 — z < e™for0 <z < 1, and 
hence (1 — a;)(1 — ag) «+: (1 — ay) < ett": +"); with increas- 
ing N the last quantity tends to zero. We have thus proved that with 
probability one at least one among the {A+} occurs. i 

Next, divide the sequence {A+} into two subsequences {A;*} and 
{A,**} so that both series 2P{A,*} and ZP{A;,**} diverge. Applying 
our result to these subsequences we find that, with probability one, at 
least one A,* and one A,** occur. Therefore there is probability one 
that at least two among the {A+} occur. Applying, in turn, this state- 
ment to the sequences {A;,*} and {A,**} we find that at least four 
among the {A+} are bound to occur,.etc 


Example. What is the probability that in a sequence of Bernoulli 
trials the pattern SFS appears infinitely often? Let A; be the event 
that the trials number k, k + 1, and k + 2 produce the sequence SFS. 
The events A; are obviously not mutually independent, but the 
sequence Aj, A4, Az, A10, ... contains only mutually independent - 
events (since no two depend on the outcome of the same trials). Since 
a; = p’q is independent of k, the series a, + a, + a7 +... diverges, 
and hence with probability one the pattern SFS occurs infinitely often. 
A similar argument obviously applies for arbitrary patterns. (For 
further examples see problems 4 and 5.) 


4, THE STRONG LAW OF LARGE NUMBERS 


The intuitive notion of probability is based on the expectation that 
the following is true: If the number of successes in the first n trials of 
a sequence of Bernoulli trials is Sn, then 

Sa 
i — >p. 
(4.1) a 
In the abstract theory this cannot be true for every sequence of trials; 
in fact, our sample space contains a point representing the conceptual 
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possibility of an infinite sequence of uninterrupted successes, and for 
it S,/n = 1. However, it is demonstrable that (4.1) holds with prob- 
ability one, so that the cases where (4.1) does not hold form a negligible 
exception. 

Note that we deal with a statement much stronger than the weak 
law of large numbers [VI(4.2)]. The latter says that for every suffi- 
ciently large fired n the average S,,/n is likely to be near p, but it does 
not say that S,/n is bound to stay near p if the number of trials is 
increased. ` It leaves open the possibility that in n additional trials at 
least one of the events Sn41/(n + 1) < p — «6, or Sn42/(n + 2) <p—e, 
.++, OF Son/2n < p — «, occurs; the probability of this is the sum of a 
large number of probabilities of which we know only that they are in- 
dividually small. We shall now prove that with probability one 
S,/n — p becomes and remains small. 


Strong Law of Large Numbers. For every e > 0 we have prob- 
ability one that only finitely many of the events 


S, 
=~ p| > 
n 


(4.2) 


occur. This implies that (4.1) holds with probability one. In terms 
of finite sample spaces, it is asserted that to every e > 0, ô > 0 there 
corresponds an 7 such that for all y the probability of the simultaneous 
realization of the v inequalities 


(4.3) 


is greater than 1 — 6. 


_ Proof. We shall prove a much stronger statement. Let A; be the 
event 


S: p kp 
(kpg)? 


where a > 1. Itis then obvious from VII(5.2) that, at least for all k 
sufficiently large, 


* 


(4.4) ` 


S, 


| 2 (2a log k)}, 


1 
(4.5) P{A,} <e otk = z 
Hence ZP{A;} converges, and lemma 1 of the preceding section ensures 
that with probability one only finitely many inequalities (4.4) hold. On 


a mn 
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the other hand, if (4.2) holds, then 

ks | e 
(npo)? (pg)? 

and for large n the right side is larger than (2a log n)}. Hence, the 

realization of infinitely many inequalities (4.2) implies the realization 

of infinitely many A; and has therefore probability zero. 

The strong law of large numbers was first formulated by Cantelli 
(1917), after Borel and Hausdorff had discussed certain special cases. 
Like the weak law, it is only a very special case of a general theorem 
on random variables. Taken in conjunction with our theorem on the 
impossibility of gambling systems, the law of large numbers implies 
the existence of the limit (4.1). not only for the original sequence of 
trials but also for all subsequences obtained in accordance with the 
rules of section 2. Thus the two theorems together describe the funda- 
mental properties of randomness which are inherent in the intuitive notion 
of probability and whose importance was stressed with special emphasis 
by von Mises. i 

5. THE LAW OF THE ITERATED LOGARITHM 

As in chapter VII let us again introduce the reduced number of suc- 

cesses in n trials 


(5.1) S,* 


ni 


(4.6) 


S, — np 
(npg)! 

The Laplace limit theorem asserts that P{S,* > z} ~vi — (z). 
Thus, for every particular value of n it is improbable to have a large 
S,*, but it is intuitively clear that in a prolonged sequence of trials 
S,* will sooner or later take on arbitrarily large values. Moderate 
values of S,* are most probable, but the maxima will slowly increase. 
How fast? In the course of the proof of the strong law of large numbers 
we have concluded from (4.5) that with probability one the inequality 
S,* < (2a log n)? holds for each a > 1 and all sufficiently large n. 
This provides us with an upper bound for the fluctuations of S,*, but 
this bound is bad. To see this, let us apply the same argument to the 
subsequence S2*, S4*, Ss*, Sig*, .. .; that is, let us define the event A, 
by Sæ* > (2a log k)!. The inequality (4.5) now implies that So:* < 
< (2a log k)? for a > 1 and all sufficiently large k. But for n = 2* we 
have log k ~ log log n, and we conclude that for each a > 1 and all n 
of the form n = 2 the inequality 


(5.2) S,.* < (2a log log n)? 
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will hold from some k onward. It is now a fair guess that in reality 
(5.2) holds for all n sufficiently large and, in fact, this is one part of 
the law of the iterated logarithm. This remarkable theorem ? asserts 
that (2 log log n)? is the precise upper bound in the sense that for each 
a < 1 the reverse of the inequality (5.2) will hold for infinitely many n. 


Theorem. With probability one we have 
S.” 


imap = a 
63 mre P (2 log log n) 


This means: For } > 1 with probability one only finitely many of the 
events 


(5.4) Sn > np + A(2npq log log n)? 


occur; for \ < 1 with probability one (5.4) holds for infinitely many n. 
For reasons of symmetry equation (5.3) implies that 


S,* 


(5.8a) lim inf —————- = — 
ne (2 log log n)? 


Proof. We start with two preliminary remarks. 
(1) There exists a constant c > 0 which depends on p, but not on n, 
such that 


(5.5) P{S, > np} >c 


for alln. In fact, an inspection of the binomial distribution shows that 
the left side in (5.5) is never zero, and the Laplace limit theorem shows 
that it tends to 4 as n — œ. Accordingly, the left side is bounded 
away from zero, as asserted. 

(2) We require the following lemma: Let x be fixed, and let A be 
the event that for at least one k with k <n 


(5.6) Si — kp > z. 
Then 
(5.7) P(A) <—P(S,— np > z). 


3A, Khintchine, Uber einen Satz der Wahrscheinlichkeitsrechnung, Fundamenta 
Mathematicae, vol. 6 (1924), pp. 9-20. The discovery was preceded by partial 
results due to other authors. The present proof is arranged so as to permit straight- 
forward generalization to more general random variables. 


ps 
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For a proof of the lemma let A, be the event that (5.6) holds for 
k = v but not for k = 1, 2, ..., v—1 (here 1 <v < n). The events 
A, Ae, ..., An are mutually exclusive, and A is their union. Hence 


(5.8) P{A} = P{Ai} +...+ P{Az}. 


Next, for v < n let U, be the event that the total number of successes 
in the trials number v+1, v+2, ..., n exceeds (n — v)p. If both 
A, and U, occur, then S, > S, + (n — »)p > np + z, and since the 
A,U, are mutually exclusive, this implies 


(5.9) P{S, — np > z} > P{A,U,;} + P{AgUo} +...+ 
+ P{An Un} F P{A,}. 


Now A, depends only on the first » trials and U, only on the following n —» 
trials. Hence A, and U, are independent, and P{A,U,} = P{A,}P{U,}. 
From the preliminary remark (5.5) we know that P{U,} > c, and since 
c < 1, we get from (5.9) and (5.8) 


(5.10) P{S, — np > z} > c ZP{A,} =cP{A}. 
This proves (5.7). 


(3) We now prove the part of the theorem relating to (5.4) with 
> 1. Let y be a number such that 


(5.11) 1<y<M, 


and let n, be the integer nearest to y” (r = 1, 2, ...). Let B, be the 
event that the inequality 


(5.12) Sn — np > A(2n,pq log log n,)# 


holds for at least one n with n, < n <n,41. Obviously (5.4) can hold 
for infinitely many n only if infinitely many B, occur. Using the first 
Borel-Cantelli lemma, we see therefore that it suffices to prove that 


(5.13) ZP{B,} converges. 
By the inequality (5.7) 
(5.14) P{B,} <¢7* P{Sn — nrp > A(2n,pg log log n,)!} = 
=c1P {Stu Sa (2 ir log log mm) } : 
Nri 
Now n,/n,41 ~ y™ > A™, and hence for sufficiently large r 
(5.15) P{B,} < ce P{Sr, > (2A log log n,)i}. 
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From formula VII(5.2) we get, therefore, for large r, 

1 1 
c(logn,)* e(r log y)™ 


Since à > 1, the assertion (5.13) is proved. 
(4) Finally, we prove the assertion concerning (5.4) with A < 1. 
This time we choose for y an integer so large that 


(5.16) P{B,} < ce elo, — 


y= 


(5.17) >> 

where 7 is a constant to be determined later, and put n, = y". The 
second Borel-Cantelli lemma applies only to independent events, and 
for this reason we introduce 


(5.18) D, = S„ —S 


Dane 

D, is the total number of successes following trial number n,_, and 
up to and including trial n,; for it we have the binomial distribution 
b(k; n, p) with n = n, —n,_1. Let A, be the event 

(5.19) D, — (n, — n,1)p > n(2pqn, log log n,). 


We claim that with probability one infinitely many A, occur. Since the 
various A, depend on non-overlapping blocks of trials (namely, 
Nr—ı <n < n,), they are mutually independent, and, according to 
the second Borel-Cantelli lemma, it suffices to prove that =P{A,} 
diverges. Now 


(5.20) P{A,} = 
D, — (n, — Nr—ı)P Nr } 
LTN E 
[r — mpa)? m 
` Here n,/(n, — n1) = y/(y — 1) < 17}, by (5.17). Hence 
D, — (n — n, 
(5.21) P{A,}>P jae de 
{(m, — mp-1)pq}3 
Using again the estimate (5.2) of chapter VII, we find for large r 


> (27 log log n) : 


if 
(6.22) P{A,} >—————_e-vlontoen, EO 
2n log log n, 2n(log log n,)(log n,)” 


Since n, = y” and 7 < 1, we find that for large r we have P{A,} > 1/7, 
which proves the divergence of =P{A,}. 
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be neglected. From the first part of the theorem, which has already 


(5.23) ISa =~ nr p| < 2(2pqn,_1 log log N,_1)4. 
Now suppose that 7 is chosen so close to 1 that 
‘9 =Ñ? 

(5.24) l-y< ( = : 
Then from (5.17) 

n, 
(5.25) dtr = 4— < n(n — A)? 

Y 


and hence (5.23) implies 
(5.26) Sn. — Map > —(n — d)(2pqn, log log n,)4. 


Adding (5.26) to (5.19), we obtain (5.4) with n = n,. It follows that, 
with probability 1 — e or better, this inequality holds for infinitely 
many r, and this accomplishes the proof. 

The law of the iterated logarithm for Bernoulli trials is a special 
case of a more general theorem first formulated by Kolmogoroy.4 At 
present it is possible to formulate stronger theorems (cf. problems 7 
and 8), 


6. INTERPRETATION IN NUMBER THEORY 
LANGUAGE 


Let x be a real number in the interval 0 <a < 1, and let 
(6.1) T = 410203... 


be its decimal expansion (so that each a; stands for one of the digits 
01 ks 9) This expansion is unique except for numbers of the form 
a/10” (where a is an integer), which can be written either by means of 
an expansion containing infinitely many zeros or by means of an ex- 
pansion containing infinitely many nines. To avoid ambiguities we 
now agree not to use the latter form. 

The decimal expansions are connected with Bernoulli trials with 
P = Yo, the digit 0 representing success and all other digits failure. 
If we replace in (6.1) all zeros by the letter S and all other digits by F, 
then (6.1) represents a possible outcome of an infinite sequence of 


* A. Kolmogoroff, Das Gesetz des iterierten Logarithmus, Mathematische Annalen, 
vol. 101 (1929), pp. 126-135. 
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Bernoulli trials with p = 745. Conversely, an arbitrary sequence of 
letters S and F can be obtained in the described manner from the ex- 
pansion of certain numbers x. In this way every event in the sample 
space of Bernoulli trials is represented by a certain aggregate of num- 
bers z. For example, the event “‘success at the nth trial” is repre- 
sented by all those z whose nth decimal is zero. This is an aggregate 
of 10”— intervals each of length 107”, and the total length of these 
intervals equals 745, which is the probability of our event. Every 
particular finite sample sequence of length n corresponds to an aggre- 
gate of certain intervals; for example, the sequence SFS is represented 
by the nine intervals 0.01 < x < 0.011, 0.02 < z < 0.021, ..., 
0.09 < x < 0.091. The probability of each such sample sequence 
equals the total length of the corresponding intervals on the z-axis. 
Probabilities of more complicated events are always expressed in terms 
of probabilities of finite sample sequences, and the calculation proceeds 
according to the same addition rule that is valid for the familiar 
Lebesgue measure on the z-axis. Accordingly, our probabilities will 
always coincide with the measure of the corresponding aggregate of 
points on the z-axis. We have thus a means of translating all limit 
theorems for Bernoulli trials with p = +45 into theorems concerning 
decimal expansions. The phrase “with probability one” is equivalent 
to “for almost all x” or “almost everywhere.” 

We have considered the random variable S, which gives the number 
of successes in n trials. Here it is more convenient to emphasize the 
fact that Sn is a function of the sample point, and we write S,(x) for 
the number of zeros among the first n decimals of x. Obviously the graph 
of S,(z) is a step polygon whose discontinuities are necessarily points 
of the form a/10”, where a is an integer. The ratio S,(x)/n is called 
the frequency of zeros among the first n decimals of z. 

In the language of ordinary measure theory the weak law of large 
numbers asserts that S,(2)/n — zy in measure, whereas the strong 


law states that S,(2)/n > yo almost everywhere. Khintchine’s law 
of the iterated logarithm shows that 


S,(z) — n/10 


(6.2) lim sup 
(n log log n)? 


= (0.3)2 


for almost all x. It gives an answer to a problem treated in a series 
of papers initiated by Hausdorff § (1913) and Hardy and Littlewood ê 


$ F. Hausdorff, Grundzüge der Mengenlehre, Leipzig, 1913. 


* Hardy and Littlewood, Some problems of Diophantine approximation, Acla 
Mathematica, vol. 37 (1914), pp. 155-239. 
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(1914). For a further improvement of this result see problems 7 and 8. 

Instead of the digit zero we may consider any other digit and can 
formulate the strong law of large numbers to the effect that the fre- 
quency of each of the ten digits tends to q} for almost all z. A similar 
theorem holds if the base 10 of the decimal system is replaced by any 
other base. This fact was discovered by Borel (1909) and is usually 
expressed by saying that almost all numbers are “normal.” 


7. PROBLEMS FOR SOLUTION 


1. Find an integer £ such that in rolling dice there are about even chances 
that a run of three consecutive aces appears before a non-ace run of length £. 

2. Consider repeated independent trials with three possible outcomes A, B, 
C and corresponding probabilities p, g, r (p +g +r = 1). Find the probabil- 
ity that a run of æ consecutive A’s will occur before a B-run of length 8. 

3. Continuation. Find the probability that an A-run of length æ will occur 
before either a B-run of length £ or a C-run of length y. 

4. In a sequence of Bernoulli trials let A, be the event that a run of n 
consecutive successes occurs between the 2"th and the 2"*+'st trial. If p>}, 
there is probability one that infinitely many A, occur; if p < 4, then with 
probability one only finitely many A, occur. 

5.7 Denote by N, the length of the success run beginning at the nth trial 
(i.e, Nn = 0 if the nth trial results in F, etc.). Prove that with probability 


one 


‘ N, 
(7.1) lim sup tea 1 


where Log denotes the logarithm to the basis 1 /p. 

Hint: Consider the event A, that the nth trial is followed bý a run of more 
than a Logn successes. For a > i the calculation is straightforward. For 
a < 1 consider the subsequence of trials number a;, as, ... where ap is an 
integer very close to n Log n. 

6. From the law of the iterated logarithm conclude: With probability one 
it will happen for infinitely many n that all S with n < k < 17n are positive, 
(Note: Considerably stronger statements can be proved using the results of 
chapter III.) 

7. Let ġ(t) be a positive monotonically increasing function, and let n, be 


the nearest integer to e°", If 


ete? (nr) 


(7.2) Bees 
converges, then with probability one, the inequality 

(7.3) Sn > np + (npq)ig(n) 

takes place only for finitely many n. Note that without loss of generality we 


7 Suggested by a communication from D. J. Newman. 
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may suppose that ¢(n) < 10(log log n)?; the law of the iterated logarithm 
takes care of the larger ¢(n). 


8. Prove * that the series (7.2) converges if, and only if, 
gln) -iot 
(7.4) >D ae 


converges. (Hint: Collect the terms for which n,_1 < n < n, and note that 
Ny — Nr1~n(1 — 1/logr); furthermore, (7.4) can converge only if 
$%(n) > 2 log log n.) 


2 Problems 7 and 8 together show that in case of convergence of (7.4) the inequal- 
ity (7.3) holds with probability one only for finitely many n. Conversely, if (7-1) 
diverges, the inequality (7.3) holds with probability one for infinitely many n. 
This converse is much more difficult to prove; cf. W. Feller, The general form of the 
so-called law of the iterated logarithm, Transactions of the American Mathematical 
Society, vol. 54 (1943), pp. 373-402, where more general theorems are proved for 
arbitrary random variables. For the special case of Bernoulli trials with p = $ 
cf. P. Erdés, On the law of the iterated logarithm, Annals of Mathematics (2), vol. 


43 (1942), pp. 419-436. The law of the iterated logarithm follows from the particular 
case o(t) = (2 log log ?). 


feo 


CHAPTER IX 


Random Variables; Expectation 


1. RANDOM VARIABLES 


According to the definition given in calculus textbooks, the quantity 
y is called a function of the real number z if ta every x there corresponds 
a value y. This definition can be extended to cases where the inde- 
pendent variable is not a real number. Thus we call the distance a 
function of a pair of points; the perimeter of a triangle is a function 
defined on the set of triangles; a sequence ap is a function defined for 


x 
all positive integers; the binomial coefficient Ch is a function defined 


for pairs of numbers (z, k) of which the second is a non-negative inte- 
ger. In the same sense we can say that the number S, of successes in 
n Bernoulii trials is a function defined on the sample space; to each of 
the 2” points in this space there corresponds a number Sn. 

A function defined on a sample space is called a random variable. 
Throughout the preceding chapters we have been concerned with ran- 
dom variables without using this term. Typical random variables are 
the number of aces in a hand at bridge, of multiple birthdays in a 
Company of n people, of success runs in n Bernoulli trials. In each 
case there is a unique rule which associates a number X with any 
sample point. The classical theory of probability was devoted mainly 
to a study of the gambler’s gain, which is again a random variable; in 
fact, every random variable can be interpreted as the gain of a reall or 
imaginary gambler in a suitable game. The position of a particle under 
diffusion, the energy, temperature, etc., of physical systems are random 
variables; but they are defined in non-discrete sample spaces, and their 
study is therefore deferred. In the case of a discrete sample space we 
can actually tabulate any random variable X by enumerating in some 
order all points of the space and associating with each the corresponding 


value of X. 
199 


200 RANDOM VARIABLES {IX.1 


The term random variable is somewhat confusing; random function 
would be more appropriate (the independent variable being a point in 
sample space, that is, outcome of an experiment). 

Let X be a random variable and let z1, 22, ... be the values which 
it assumes; ! in most of what follows the z; will be integers. The 
aggregate of all sample points on which X assumes the fixed value z; 
forms the event that X = zj; its probability is denoted by P{X = x;}. 
The function 


(1.1) P{X = a} = f(x) Gj =1,2,...-) 
is called the (probability) distribution ? of the random variable X. Clearly 
(1.2) f(z) = 0, Zf(z;) = 1. 


With this terminology we can say that in Bernoulli trials the number 
of successes S, is a random variable with probability distribution 
{b(k; n, p)}, whereas the number of trials up to and including the first 
success is a random variable with the distribution {q*—!p}. 

Consider now two random variables X and Y defined on the same 
sample space, and denote the values which they assume, respectively, 
by z1, £2, ..., and yo Y2, ...; let the corresponding probability dis- 
tributions be {f(z;)} and {g(yx)}. The aggregate of points in which 


the two conditions X = z; and Y = Yk are satisfied forms an event 
whose probability will be denoted by P{X = z;, Y = yk}. The Junction 
(1.3) P{X = z; Y = yk} = p(z, yx) G, k = 1, 2, ...) 


is called the joint probability distribution of XandY. It is best exhibited 


1In the standard mathematical ter: 
be called the range of X. Unfortunate! 
for the difference between the maximum and the minimum of X. 

2 For a discrete variable X the probability distribution is the function S(z;) de- 
fined on the aggregate of values zj assumed by X. This term must be distinguished 


from the term “distribution function,” which applies to non-decreasing functions 
which tend to 0 asz — —w and to 1 


h asz —> . The distribution function F(z) 
of X is defined by 
FQ) = PIX <2} = D f(z), 
< 


‘minology the set of points Tı, T2, ... should 
ly the statistical literature uses the term range 


the last sum extending over all th 
tion function of a variable can b 


vice versa. In this volume we s 
in general. 


—— 
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in the form of a double-entry table as exemplified in tables 1 and 2. 
Clearly 


(1.4) D(z, yx) > 0, Epey) = 1. 

Moreover, for every fixed j - 

(1.5) ple; y1) + plas, Y2) + Pl; ys) +.-.= P(X = a} = fz) 
and for every fixed k 


(1.6) p(x, yx) + plaz ye) + p(s, ye) +---= PLY = yx} = oye). 


In other words, by adding the probabilities in individual rows and 
columns, we obtain the probability distributions of X and Y. They 
may be exhibited as shown in tables 1 and 2 and are then called mar- 
ginal distributions. The adjective “marginal” refers to the outer ap- 
pearance in the double-entry table and is also used for stylistic clarity 
when the joint distribution of two variables and also their individual 
(marginal) distributions appear in the same context. Strictly speak- 
ing, the adjective “marginal” is redundant. 

The notion of joint distribution carries over to systems of more than 
two random variables. 


Examples. (a) Random placements of 3 balls into 3 cells. We refer 
to the sample space of 27 points "defined formally in table 1 accom- 
panying example I(2.a); to each point we attach probability sy. Let 
N denote the number of occupied cells, and for ¢ = 1, 2, 3 let X; denote 
the number of balls in the cell number 7. These are picturesque de- 
scriptions. Formally N is the function assuming the value 1 on the 
sample points number 1-3; the value 2 on the points number 4-21; 
and the value 3 on the points.number 22-27. Accordingly, the prob- 
ability distribution of N is defined by P{N = 1} = 4, P{N = 2} = 3, 
P{N = 3} =%. The joint distributions of (N, X,) and of (Xj, X2) 
are given in tables 1 and 2. 

(b) Dice. Inn throws of an ideal die let X;, X2, Xs, respectively, de- 
note the number of ones, twos, and threes. The probability p(k, ke, k3) 
that the n throws result in kı ones, kz twos, ka threes, and n — kı — 
— kz — kg other faces is given by the multinomial distribution VI(9.2) 
with pı = P2 = Ps = $, P4 = 2, that is, by 

n! 


= . grik kg, 
(1.7) pln ka ko) = Fie — ka — ka — ka)! 


This is the joint distribution of X1, X2, Xs. Keeping kı, ke fixed and 
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TABLE 1 


Joint DISTRIBUTION or (N, X,) IN EXAMPLE (a) 


Distribution of X, 
EM) =3%, EN) = 12, Var(N) = $f 
E(%:) = 1, E(X’) = 4, Var(X) = 4 
E(NX)) = 4, Cov(N, X:) = 0. 


N is the number of occupied cells, X; the number of balls in the first cell 
when 3 balls are distributed randomly in 3 cells. For abbreviation q = yy, 


Ss 


TABLE 2 


Jomr DISTRIBUTION or (X1, X2) Iv Exampty (a) 


Distribution of X, 


Distribution of X | 89 


E(X) =1, E(X = ; ae 
EK, X) = 3, # ea) m as 


X; is the number of balls in the ith istri 
in 3 cells. For abbreviation q E en aioe aaar isthaka Gia 


eS eee 


oo I 
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summing (1.7) over the possible values ka = 0, 1, ..., n—kı--kz, we 
get, using the binomial theorem, 
ni 

=) — Se Sige 
(1.8) p(ki, k2) tala — k- kN 4 6. 
This is the joint distribution of (X,, X2), which now appears as mar- 
ginal distribution for the triple distribution of X1, X2, X3. Needless 
to say that (1.8) could have been obtained directly from the multi- 
nomial distribution. Summing (1.8) once more over all k = 0, 1, 
..., n—k; we obtain the distribution of X;, namely the binomial dis- 
tribution with p = 4. 

(c) Sampling. Let a population of n elements be divided into three 
classes of respective sizes nı = npi, no = Npa, and ng = nps (where 
pi + po + ps = 1). Suppose that a random sample of size r is drawn, 
and denote by X, and X, the numbers of representatives of the first 
and second class in the sample. If the sample is with replacement, 
P{X, = ky, X2 = ko} is given by the multinomial distribution 


rl 
kı!ka!(r — kı — kə)! 


[See formula VI(9.2).] The variable X; has the binomial distribution 
{b(k;r, p)}. If the sampling is without replacement, then P{X; = kı, 
X, = kz} is given by the double hypergeometric distribution II(6.5) 
and X; has the simple hypergeometric distribution II(6.1). 

(d) Randomized sampling. Consider once more the preceding exam- 
ple but suppose that the sample size r, instead of being fixed in advance, 
depends on the outcome of a random experiment. More precisely, 
suppose that the size-of the sample depends on a Poisson distribution: 
The probability that the sample size is r is p(r; à) = e>)"/r! and, 
given the sample size 7, the (conditional) probability that X, = kı and 
Xp = ke is f(k, k2) of (1.9). For the joint probability distribution of 
(Xi, Kz) we have then 


(1.10) P{X, = a, Xa = ka} =O DL Nha, ka)/r! = 


(1.9) Ska, ke) = pippi. 


r=ki tks 
o k; 
ae ares Ap) Apa) 2 Ops) _ a) (Api)*Q\pa)"* 
¥ kilka! ao ka! kilka! 
or 
(1.11) P{X; = kı, Xe = ko} = plki; àpı)p(kz; Apa). 


Summing over kz we find that X, has the Poisson distribution p(k; \p1). 


204 RANDOM VARIABLES (IX.1 


(Problem VI, 27 paraphrases the same statement.) The joint distri- 
bution of (Xi, X2) takes on the form of a multiplication table of the 
two marginal distributions {p(k;p)} and {p(k; Ap2)}. We shall ex- 
press this by saying that X, and X, are independent. 


With the notation (1.3) the conditional probability of the event 
Y = yx, given that X = z; (with f(z;) > 0), becomes 


_ Pi Ye) 
S(z;) 


It is convenient to abbreviate (1.12) to P{Y = yx |X}; this defines the 
(conditional) distribution of Y for given X. A glance at tables 1 and 2 
shows that the conditional probability (1.12) is in general different 
from g(yx). This indicates that inference can be drawn from the values 
of X to those of Y and vice versa; the two variables are (stochastically) 
dependent. The strongest degree of dependence exists when Y is a 
function of X, that is, when the value of X uniquely determines Y. For 
example, if a coin is tossed n times and X and Y are the numbers of 
heads and tails, then Y = n — X. Similarly, when Y = X?, we can 
compute Y from X. In the joint distribution this means that in each 
row all entries but one are zero. If, on the other hand, P(T; Ye) = 
= f(x;)g(yx) for all combinations of Tj, Yk, then the events X = 2; and 
Y = yz are independent; the joint distribution assumes the form of a 
multiplication table. In this case we speak of independent random 
variables. They occur in particular in connection with independent 
, the numbers scored in two throws of a die are inde- 
e of a different nature is found in example (d). 
era of X and Y determines the distribu- 
» PUL that we cannot calculate the joint distribution 
of X and Y from their marginal distributions, If two variables X and 


z have he same distribution, they may or may not be independent. 
'or example, the two variables Xı and X; in table 2 have the same 
distribution and are dependent, 


All our notions appl; 
y also ti 3 
We recapitulate in the formal A tigi moss than two variables. 


(1.12) P{Y = |X = z;} 


Pe ae random variable X is a function defined on a given 

ee tee a an assignment of a real number to each sample 

Tf two atin mils eee of X is the function defined in (1.1). 
. . t a 

their Joint distribution eee the ein 


4 is given 7 ; np: 
combinations (z;, y,) alies AA a and assigns probabilities to all 


by Xand Y. This notion carries 
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over, in an obvious manner, to any finite set of variables X, Y, ..., W 
defined on the same sample space. These variables are called mutually 
independent if, for any combination of values, (x, y, ..., w) assumed by 
(1.13) P{X=2,Y=y,...,.W=w} = 

= P{X = z} P{Y = y} --- P{W = w}. 


In chapter V, section 4, we have defined the sample space corre- 
sponding to n mutually independent trials. Comparing this definition 
to (1.13), we see that if Xy depends only on the outcome of the kth trial, 
then the variables X;, ..., Xn are mutually independent. More generally, 
if a random variable U depends only on the outcomes of the first k 
trials, and another variable V depends only on the outcomes of the 
last n—k trials, then U and V are independent (cf. problem 39). 

We may conceive of a random variable as a labeling of the points 
of the sample space. This procedure is familiar from dice, where the 
faces are numbered, and we speak of numbers as the possible outcomes 
of individual trials. In conventional mathematical terminology we 
could say that a random variable X is a mapping of the original sample 
space onto a new space whose points are 21, Z2, ..-. Therefore: 

Whenever {f(x;)} satisfies the obvious conditions (1.2) it is legiti- 
mate to talk of a random variable X, assuming the values 21, Zo, .-. 
with probabilities f(x), f(v2), ----without further reference to the old 
sample space; a new one is formed by the sample points 2, 2, .... 
Specifying a probability distribution is equivalent to specifying a sample 
space whose points are real numbers. Speaking of two independent ran- 
dom variables X and Y with distributions {f(z;)} and {g(ye)} is equiva- 
lent to referring to a sample space whose points are pairs of numbers 
(E; yx) to which probabilities are assigned by the rule P{(z;, yx)} = 
= f(xpg(yx). Similarly, for the sample space corresponding to a ‘set of 
n random variables (X, Y, ..-, W) we can take an aggregate of points 
(x,y, ..., w) in the n-dimensional space to which probabilities are 
assigned by the joint distribution. The variables are mutually independent 
if their joint distribution is given by (1.13). 

Example. (e) Bernoulli trials with variable probabilities. Consider 
n independent trials, each of which has only two possible outcomes, 
S and F. The probability of S at the kth trial is px, that of F is 
a = 1 — py. If pe = p, this scheme reduces to Bernoulli trials. The 
simplest way of describing it is to attribute the values 1 and 0 to § 
and F. The model is then completely described by saying that we 
have n mutually independent random variables Xj, with distributions 
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P{X; = 1} = pu, P{X, = 0} = ge. This scheme is known under the 
confusing name of “Poisson trials.” [See examples (5.b) and XI(6.b).] 


It is clear that the same distribution can occur in conjunction with 
different sample spaces. If we say that the random variable X assumes 
the values 0 and 1 with probabilities 4, then we refer tacitly to a 
sample space consisting of the two points 0 and 1. However, the varia- 
ble X might have been defined by stipulating that it equals 0 or 1 
according as the tenth tossing of a coin produces heads or tails; in 
this case X is defined in a sample space of sequences (HHT...), and 
this sample space has 2° points. 

In principle, it is possible to restrict the theory of probability to 
sample spaces defined in terms of probability distributions of random 
variables. This procedure avoids references to abstract sample spaces 
and also to terms like “trials” and “outcomes of experiments.” The 
reduction of probability theory to random variables is a short cut to 
the use of analysis and simplifies the theory in many ways. However, 
it also has the drawback of obscuring the probability background. The 
notion of random variable easily remains vague as “something that 
takes on different values with different probabilities.” But random 


variables are ordinary functions, and this notion is by no means peculiar 
to probability theory. 


Example. (f) Let X be a random variable with possible values 
T1, Z2, ... and corresponding probabilities Se), Fa); v0. LEG helps 
the reader’s imagination, he may always construct a conceptual experi- 
ment leading to X. For example, subdivide a roulette wheel into ares 
lı, l2, ... whose lengths are as S(@1):f(we):.... Imagine a gambler 
receiving the (positive or negative) amount z; if the roulette comes to 
rest at a point of lj. Then X is the gambler’s gain. Inn trials, the gains 
are assumed to be n independent variables with the common distribu- 
tion {f(z;)}. To obtain two variables with a given joint distribution 


{pC yz)} let an are correspond to each combination (z;, y) and 
think of two gamblers receiving the amounts x; and yz, respectively. 
HEYZ o, 


are random variables defined on the same sample 
space, then any function F(X, Y, Z, .. -) is again a random variable. 
Its distribution can be obtained fr 


n 1 ; om the joint distribution of X, Y, 
, ++. simply by collecting the terms which correspond to combinations 


of (X, Y, Z, ...) giving the same value of F(X, Y, Zia sa) 
Example. (g) In the example illustrated by table 2 the sum 
Xı + Xə is a random variable assuming the values 0, 1, 2, 3 with 
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probabilities g, 6g, 12g, 8g (where g = 7). The product XX, assumes 
the values 0, 1, 2 with probabilities 15g, 6g, 6g. 


2. EXPECTATIONS 


` To achieve reasonable simplicity it is often necessary to describe 
probability distributions rather summarily by a few “typical values.” 
An example is provided by the median which was used above in con- 
nection with waiting times. The median £m of the distribution (1.1) 
is that value assumed by X for which P{X < zm} < $ and also 
P{X > 2m} < }. In other words, £m is chosen so that the probabilities 
of X exceeding or falling short of £m are as close to 4 as possible. 

However, among the typical values the expectation or mean is by 
far the most important. It lends itself best to analytical manipula- 
tions, and it is preferred by statisticians because of a property known 
as sampling stability. Its definition follows the customary notion of 
an average. If in a certain population n, families have exactly k chil- 
dren, the total number of families is n = no + nı + ne +... and the 
total number of children m = nı + 2n + 3na +.... The average 
number of children per family is m/n. The analogy between proba- 
bilities and frequencies suggests the following 


Definition. Let X be a random variable assuming the values £1, £2, 
... with corresponding probabilities S(1), f(z2), .... The mean or 
expected value of X is defined by 


(2.1) E(X) = 22if(ex) 


provided that the series converges absolutely. In this case we say that X 
has a finite expectation. If =|xx|f(ax) diverges, then we say that X has 
no finite expectation. 


It goes without saying that the most common random variables have 
finite expectations; otherwise the concept would be impractical. How- 
ever, variables without finite expectations occur in connection with 
important recurrence problems in physics. The terms mean, average, 
and mathematical expectation are synorymous. We also speak of the 
mean of a distribution instead of referring to a corresponding random 
variable. The notation E(X) is generally accepted in mathematics 
and statistics. In physics X, <X>, <X> av are common substitutes 
for E(X). 

We wish to calculate expectations of functions such as X?. This 
function is a new random variable assuming the values «,”; in general, 
the probability of X? = a,” is not f(x,) but f(zx) + f(—zx) and E(X?) 
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is defined as the sum of z} {f(@+) + f(—zx)} for all k such that z420. 
Obviously 


(2.2) E(X?) = Zr: f(e) 


provided the series converges. The same procedure of collecting terms 
leads to the general 


Theorem 1. Any function (x) defines a new random variable (X). 
If (X) has finite expectation, then 


(2.3) E(¢(X)) = Zo(ex)f(ex); 


the series converges absolutely if, and only if, E(¢(X)) exists. For any 
constant a we have E(aX) = aE(X). 


If several random variables X4, ..., X, are defined on the samc 
sample space, then their sum X, +...-+ X, is a new random variable. 
Its possible values and the corresponding probabilities can be readily 
found from the joint distribution of the X, and thus E(X; +...+ X,) 


can be calculated. A simpler procedure is furnished by the following 
important 


Theorem 2. If X, Xo, ..., Xn are random variables with expecta- 


tions, then the expectation of their sum exists and is the sum of their 
expectations: 


(2.4) E(X; +...+ Xn) = E(X;) +...+ E(X,). 


Proof. It suffices to prove (2.4) for two variables X and Y. Using 
the notation (1.3), we can write 


(2.5) E(X) + E(Y) = 2 Tip(z;, Ye) + x YeP(Zi, Yu), 
j js 


the summation extending over all possible values zj, y (which need 
not be all different). The two series converge; their sum can there- 


fore be rearranged to give 2j(z; + yx)p(2j, yx), which is by definition 
the expectation of X + Y. This accomplishes the proof. 


Clearly, no corresponding general theorem holds for products; for 
example, E(X?) is generally different from (E(X))?. Thus, if X is the 
number scored with a balanced die, E(X) = 4, but E(X?) = (1 + 4 + 
+9 + 16 + 25 + 36)/6 = 9+. However, the simple multiplication. 
rule holds for mutually independent variables. 


Theorem 3. If X and Y are mutually independent random variables 
with finite expectations, then their product is a random variable with finite 
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expectation and 
(2.6) E(XY) = E(X)E(Y). 


Proof. To calculate E(XY) we should multiply each possible value 
xjy with the corresponding probability. We have already remarked 
that the values z+ in the definition (2.1) need not be different. Hence 


(2.7) E(XY) = È awef(zso@e) = { 5 zif} [z natu} 
jik i k 


the rearrangement being justified since the series converge absolutely. 
This proves the theorem. By induction the same multiplication rule 
holds for any number of mutually independent random variables. 


It is convenient to have a notation also for the expectation. of a con- 
ditional probability distribution. If X and Y are two random variables 
with the joint distribution (1.3), the conditional expectation E(Y|X) of 
Y for given X is the function 
x YEP (Xj, Yk) 
(2.8) L uP{Y = yx|X = z} = —_—__ 

E Fa) 
provided the series converges absolutely and f(x;) > 0 for all 3. 


3. EXAMPLES AND APPLICATIONS 

(a) Binomial distribution. Let S, be the number of successes in n 
Bernoulli trials with probability p fer success. We know that S, has 
the binomial distribution {b(k; n, p)}, whence E(S,) = Dkb(k;n, p) = 
= npzb(k—1;n—1, p). The last sum includes all terms of the bi- 
nomial distribution for n — 1 and hence equals 1. Therefore the mean 
of the binomial distribution is 
(3.1) E(S,) = np. 

The same result could have been obtained without calculation by a 
method which is often expedient. Let X, be the number of successes 
scored at the kth trial. This random variable assumes only the values 0 
and 1 with corresponding probabilities g and p. Hence E(X,) = 0-¢ + 
+ 1-p = p, and since 
(3.2) Se Xi Xe ee ee 
we get (3.1) directly from (2.4). 

(b) Poisson distribution. If X has the Poisson distribution p(k; A) = 
= en" /k! (where k = 0, 1, ..-) then 

E(X) = Dkp(k;d) = M2p(k—1; A). 
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The last series contains all terms of the distribution and therefore adds 
to unity. Accordingly, the Poisson distribution {e™™*/k!} has the 
mean À. 

(c) Negative binomial distribution. Let X be a variable with the 
geometric distribution P{X = k} = q*p where k = 0, 1, 2, .... Then 
E(X) = gp(1 + 2¢ + 3g? +...). On the right we have the derivative 
of a geometric series so that E(X) = gp(1 — q)? = q/p. We have 
seen in chapter VI, section 8, that X may be interpreted as the number 
of failures preceding the first success in a sequence of Bernoulli trials. 
More generally, we have studied the sample space corresponding to 
Bernoulli trials which are continued until the nth success. For r < n, 
let X, = X, and let X, be the number of failures following the 
(r—1)st success and preceding the rth success. Then each X, has the 
geometric distribution {gp}, and E(X,) = g/p. The sum Y, = X; + 
+...+ X, is the number of failures preceding the rth success. In 
other words, Y, is a random variable whose distribution is the negative 
binomial defined by either of the two equivalent formulas VI(8.1) or 
VI(8.2). It follows that the mean of this negative binomial is rq/p. 
This can be verified by direct computation, From VI(8.2) it is clear 
that kf(k; r, p) = rp*gf(k—1;r-+1, p), and the terms of the distribu- 
tion {f(k—1;r+1, p)} add to unity. This direct calculation has the 
advantage that it applies also to non-integralr. On the other hand, the 
first argument leads to the result without requiring knowledge of the 
explicit form of the distribution of K, +...-+ X,. 

d) Waiting times in sampling. A population of N distinct elements 
is sampled with replacement. Because of repetitions a random sample 
of size r will in general contain fewer than r distinct elements. As the 
sample size increases, new elements will enter the sample more and 
more rarely. We are interested in the sample size S, necessary for 
the acquisition of r distinct elements. (As a special case, consider the 
population of N = 365 possible birthdays; here S, represents the num- 
ber of people sampled up to the moment where the sample contains r 
different birthdays. A similar interpretation is possible with random 
placements of balls into cells. Our problem is of particular interest ùo 
collectors of coupons and other items where the acquisition can be 
compared to random sampling.) 


3G. Polya, Eine Wahrscheinlichkeitsaufgabe zur Kundenwerbung, Zeitschrift fiir 
Angewandte Mathematik und Mechanik, vol. 10 (1930), pp. 96-97. Polya treats a 
slightly more general problem with different methods. There exists a huge litera- 


ture treating variants of the coupon collector's problem. [Cf. ble 24, 
XI, 12-14, and II(11.12).] F n e aa 
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The first element enters the sample at the first drawing. The num- 
ber of drawings from the second up to and including the drawing at 
which a new element enters the sample is a random variable X4; gener- 
ally, let X, be the number of drawings following the selection of the 
rth element up to and including the selection of the next new element. 
Then S, = 1 + X, +...+ X,_1 is the sample size at the moment that 
the rth element enters the sample. Once the sample contains k differ- 
ent elements the probability of drawing a new one is at each drawing 
p = (N —k)/N. The number, X;, of drawings up to and including 
the drawing of a new element equals one plus the number of failures 
preceding the first success in Bernoulli trials with p = (N — k)/N. 
Therefore E(X,) = 1 + g/p = N/(N — k) and, from the addition the- 
orem (2.4), 


1 1 1 1 

(3.3) E(S,) =v {i+ ct gee cere +} 2 

For r = N we get the expected number of drawings necessary to 
exhaust the entire population. For N = 10 we have E(S10) = 29.29..., 
and E(Ss) = 6.46.... This means that we can expect to cover half 
the population in about six to seven drawings, whereas the second half 
requires some 23 more drawings. A reasonable approximation to (3.3) 
for large N is 


N 

(8.4) E(S,) = N log FoR 

In particular, for any fraction a < 1 the expected number of drawings 
required to obtain a sample containing about the fraction a of the entire 
population is, for large N, approximately N log {1/(1 — a); the expected 
number of drawings necessary to have all N elements included in the sam- 
ple is, approximately, N log N. Note that our results are again ob- 
tained without use of the distribution. 

(e) An estimation problem. A bowl contains balls numbered 1 to N. 
Let X be the largest number drawn in n drawings when random sampling 
with replacement is used. The event X < k means that each of n 
numbers drawn is less than or equal to k and therefore P{X < k} = 
= (k/N)". Hence the probability distribution of X is given by 


(3.5) pe =P[X = k} = PIX < k} — P{X < k — 1} = 


= EN 
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It follows that 


N N 
G) ER) = Dim = Ny eH E- 1 — k p”) = 
k=l 


k=l 
N 
SINSEN a E= 1)%}. 
k=l 


For large N the last sum is approximately the area under the curve 
y = 2" from z = 0 to z = N, that is, N”+!/(n +1). It follows that 
for large N 


n 
(3.7) E(®) =N. 


Tf a town has N = 1000 cars and a sample of n = 10 is observed, the 
expected number of the highest observed license plate (assuming ran- 
domness) is about 910. (The median is 934.) The practical statistician 
uses the observed maximum in a sample to estimate the unknown true 
number N. This method was used during the last war to estimate 
enemy production (cf. problems 8-11). 

(f) Banach’s match box problem. In chapter VI, section 8, we found 
the distribution 


(3.8) u, = G. x 3) : 


N Q2N—r 


for the number X of matches left at the moment when the first box is 
found empty. We are unable to calculate the expectation E(X) = 
in a direct way, but the following indirect way is applicable in many 
similar cases, Using the fact that the u, add to unity (which is not 
easily verified), we find 


ya ya 2N N 1 
3.9) N-p= N- p= N- 
G9) V- n= DW-ny = T¢ ae ) 


—r/] Q2N-r 


By a simple operation on the binomial coefficients the last sum is 
transformed into 


N- Lae *. 
(3.10) EON- Fa 4 N 2 
r= N 


—r—1/) 207 
2N -+1%= 142 
ar È up- A È (r+ Dup. 
r=0 r=0 


The last sum is identical with the sum defining » = E(X). In the 
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first sum all u, except uo occur, and hence the terms add to 1 — up. 
Thus from (8.9) and (3.10) 


2N+1 
(3.11) N-p= a-w) — 5 
or 
2N + 1 /2. 
(3.12) e= QN + Dw ~ 1 = Sy (ea eee 


Using Stirling’s formula, we find 
(3.13) p= 2(N/x) — 1. 
In particular, in the distribution of chapter VI, table 8, we had N = 50. 
For it » = 7.04... and the median is 6. 
4, THE VARIANCE 

Let X be a random variable with distribution {f(z;)}, and let r > 0 
be an integer. If the expectation of the random variable X", that is, 
(4.1) E(X’) = Z2;' f(z), 


exists, then it is called the rth moment of X about the origin. If the series 
does not converge absolutely, we say that the rth moment does not 
exist. Since |X|" <|X|" + 1, it follows that whenever the rth moment 
exists so does the (r—1)st, and hence all preceding moments. 

Moments play an important role in the general theory, but in the 
present volume we shall use only the second moment. If it exists, so 
does the mean 


(4.2) n = E(X). 

It is then natural to introduce instead of the random variable its 
deviation from the mean, X — u. Since (x — u)? < 2(x? + p?) we see 
that the second moment of X — p exists whenever E(X?) exists. We 
find 


(4.3) E(K = 4?) = D (P — Zuz + Pfa). 

d: 
Splitting the right side into three individual sums, we find it equal to 
E(X?) — 2uE(X) + p? = E(X?) — p’. 


Definition. Let X be a random variable with second moment E(X?) 
and let » = E(X) be its mean. We define a number called the variance of 


X by 
(4.4) Var(X) = E((X — p)?) = E(X?) — p’. 
Its positive square root (or zero) is called the standard deviation of X, 
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For simplicity we often speak of the variance of a distribution with- 
out mentioning the random variable. “Dispersion” is a synonym for 
the now generally accepted term “variance.” 


Examples. (a) If X assumes the values +c, each with probability 
4, then Var(X) = œ. 

(b) If X is the number of points scored with a symmetric die, then 
Var(X) = (127 + 2 +...4 6?) — G? = 3. 

(c) For the Poisson distribution p(k; N) the mean is à [cf. example 
(3.b)] and hence the variance 2k”p(k; A) — A? = AZkp(k—-1;d) — X? = 
= \D(k — 1)plk—1; A) + ADp(k—-1;) —- =X 4+A—-NE|d. In 
this case mean and variance are equal. 

(d) For the binomial distribution [cf. example (3.a)] a similar com- 
putation shows that the variance is 


Dk°b(k; n, p) — (np)? = npEkb(k—1;n—1, p) — (np)? = 
= np{(n — 1)p + 1} — (np)? = npg. 


The usefulness of the notion of variance will appear only gradually, 
in particular, in connection with limit theorems of chapter X. Here 
we observe that the variance is a rough measure of spread. In fact, if 
Var(X) = D(x; — »)*f(x;) is small, then each term in the sum is small. 
A value zj for which |z; — u| is large must therefore have a small 
probability f(z;). In other words, in case of small variance large devia- 
tions of X from the mean p are improbable. Conversely, a large vari- 
ance indicates that not all values assumed by X lie near the mean. 


Some readers may be helped by the following interpretation in mechanics, Sup- 
pose that a unit mass is distributed on the z-axis so that the mass f(z;) is concen- 
trated at the point z; Then the mean y is the abscissa of the center of gravity, 
and the variance is the moment of inertia. Clearly different mass distributions may 
have the same center of gravity and the same moment of inertia, but it is well 
known that the most important mechanical properties can be described in terms 
of these two quantities. 


If X represents a measurable quantity like length or temperature, 
then its numerical values depend on the origin and the unit of measure- 
ment. A change of the latter means passing from X to a new variable 


aX + b, where a and b are constants. Clearly Var(X + b) = Var(X), 
and hence 


(4.5) Var(aX + b) = a?Var(X). 


The choice of the origin and unit of measurement is to a large degree 
arbitrary, and often it is most convenient to take the mean as origin 
and the standard deviation as unit. We have done so in chapter VII, 


H 
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when we introduced the normalized number of successes S,* = 
= (S, — np)/(npg)!. In general, if X has mean p and variance 
o°(c > 0), then X — » has mean zero and variance o”, and hence the 
variable 

XF 
(4.6) X*= 


o 


has mean 0 and variance 1. It is called the normalized variable corre- 

sponding to X. In the physicist’s language, the passage from X to X* 

would be interpreted as the introduction of dimensionless quantities. 
5. COVARIANCE; VARIANCE OF A SUM 


Let X and Y be two random variables on the same sample space. 
Then X + Y and XY are again random variables, and their distribu- 
tions can be obtained by a simple rearrangement of the joint distribu- 
tion of X and Y. Our aim now is to calculate Var(X + Y). For that 
purpose we introduce the notion of covariance, which will be analyzed 
in greater detail in section 8. If the joint distribution of X and Y is 
{p(z;, yx)}, then the expectation of XY is given by 
(5.1) E(KY) = 2ajyep(@i, yx), 


provided, of course, that the series converges absolutely. Now 
|xjyx| < (x? + y2)/2 and therefore E(XY) certainly exists if E(X?) 
and E(Y?) exist. In this case there exist also the expectations 


(6.2) uz =E(X), m = EY), 


and the variables X — u, and Y — uy-have means zero. For their 
product we have from the addition rule of section 2 


(5.38) E((X — we)(¥ — wy)) = E(XY) — weE(Y) — wyE(X) + uzu = 
= E(XY) — pemy. 
Definition. The covariance of X and Y is defined by 
(5.4) Cov(X, Y) = E((X — #z)(¥ — my)) = E(XY) — pany. 
This definition is meaningful whenever X and Y have finite variances. 


We know from section 2 that for independent variables E(XY) = 
= E(X)E(Y). Hence from (5.4) we have 


Theorem 1. If X and Y are independent, then Cov(X, Y) = 0. 


Note that the converse is not true. For example, a glance at table 1 
shows that the two variables are dependent, but their covariance van- 
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ishes nevertheless. We shall return to this point in section 8. The 
next theorem is important, and the addition rule (5.6) for independent 
variables is constantly applied. 


Theorem 2. If X;, ..., Xn are random variables with finite variances 
17, «++, On, and Sn = X1 +...+ Xn, then 


(5.5) Var(S,,) = > ok + 2 >> Cov(X;, Xz) 


k=1 jk 
n * + . 
the last sum extending over each of the () pairs (Xj, X;) with j < k. 


In particular, if the X; are mutually independent, then the addition rule 
(5.6) Var(Sn) = 017? + 02 +...+ on? 
holds. 


Proof. Put m = E(K:) and mn = p +.. -+ un = E(S,). Then 
Sn — mn = D(X; — wx) and 


(5.7) (Sn — mn)? = Di (Be — we)? + 20K; — w,)(Xe — m). 


Taking expectations and applying the addition Tule, we get (5.5). 
Equation (5.6) follows from the preceding theorem. 


Examples. (a) Binomial distribution {b(k; n, p)}. Inexample (8.a), 
the variables X; are mutually independent. We have E(X;7) = 0-2¢ + 
+ 1-°p = p, and E(X,) = p. Hence ox? = p — P? = pq, and from 
(5.6) we see that the variance of the binomial distribution is npg. The 
same result was derived by direct computation in example (4.d). 

.(b) Bernoulli trials with variable probabilities. Let Xi, +...; X, be 
mutually independent random variables such that X; assumes the 
values 1 and zero with probabilities pi and q; = 1 — pk respectively. 
Then E(X,) = px and Var(X;,) = Dk — PK = Prik: Putting again 
S, = Xi +...+ Xn we have from (5.6) 


(5.8) Var(S,) = 2 Prik 
=l 


As in example (1.e) the variable S, may be interpreted as the total 
number of successes in n independent trials, each of which results in 
success or failure. Then p = (pı +...+ Pn)/n is the average prob- 
ability of success, and it seems natural to compare the present situation 
to Bernoulli trials with the constant probability of success p. Such a 
comparison leads to a striking result. We may rewrite (5.8) in the 
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form Varn) = np — Zp. Next, it is easily seen (by elementary cal- 
culus or simple induction) that among all combinations {p+} such that 
Zp. = np the sum =p,” assumes its minimum value when all p; are 
equal. It follows that, if the average probability of success p is kept 
constant, Var(S,,) assumes its maximum value when pı =... = Dn = P- 
We have thus the surprising result that the variability of Pr, or lack of 
uniformity, decreases the magnitude of chance fluctuations as m 

by the variance. For example, the number of annual fires in a com- 
munity may be treated as a random variable; for a given average 
number, the variability is mazimal if all households have the same 
probability of fire. Given a certain average quality p of n machines, 
the output will be least uniform if all machines are equal. (An applica- 
tion to modern education is obvious but hopeless.) 

(c) Card matching. A deck of n numbered cards is put into random 
order so that all n! arrangements have equal probabilities. The num- 
ber of matches (cards in their natural place) is a random variable S, 
which assumes the values 0, 1, ..., n. Its probability distribution 
was derived in chapter IV, section 4. From it the mean and variance 
could be obtained, but the following way is simpler and more instruc- 
tive. 

Define a random variable X, which is either 1 or 0, according as 
card number k is or is not at the kth place. Then Sn = Xi +...+ Xn. 
Now each card has probability 1/n to appear at the kth place. Hence 
P{X, = 1} = 1/n and P{X, = 0} = (n — 1)/n. Therefore E(X;) = 
= 1/n, and it follows that E(S,) = 1: the average is one match per 
deck. To find Var(S,,) we first calculate the variance cx? of Xp: 


1 N n-1 
(59) of ==- £) Se 


Next we calculate E(X;X;,). The product X;X; is 0 or 1; the latter is 
true if both card number j and card number k are at. their proper 
places, and the probability for that is 1/n(n — 1). Hence 


1 
(5.10) E(X;X,) = aS): 
1 


@—l) ww acD 


Cov(X;, Kx) = 
n 


4 For stronger results in the same direction see W. Hoeffding, On the distribution 
of the number of successes in independent trials, Annals of Mathematical Statistics, 


vol. 27 (1956), pp. 718-721. 
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Thus finally 


n—1 2 (3) 1 1 
(5.11) Var(S,) = n = + 2) aD 7 

We see that both mean and variance to the number of matches are 
equal to one. This result may be applied to the problem of card guess- 
ing discussed in chapter IV, section 4. There we considered three 
methods of guessing, one of which corresponds to card matching. The 
second can be described as a sequence of n Bernoulli trials with prob- 
ability p = 1/n, in which case the expected number of correct guesses 
is np = 1 and the variance npg = (n — 1)/n. The expected numbers 
are the same in both cases, but the larger variance with the first method 
indicates greater chance fluctuations about the mean and thus promises 
a slightly more exciting game. (With more complicated decks of cards 
the difference between the two variances is somewhat larger but never 
really big.) With the last mode of guessing the subject keeps calling 
the same card; the number of correct guesses is necessarily one, and 
chance fluctuations are completely eliminated (variance 0). We see 
that the strategy of calling cannot influence the expected number of 
correct guesses but has some influence on the magnitude of chance 
fluctuations. 

(d) Sampling without replacement. Suppose that a population con- 
sists of b black and g green elements, and that a random sample of size 
r is taken (without possible repetitions). The number Sz of black 
elements in the sample is a random variable with the hypergeometric 
distribution (chapter II, section 6) from which the mean and the vari- 
ance can be obtained by direct computation. However, the following 
method is preferable. Define the random variable X; to assume the 
values 1 or 0 according as the kth element in the sample is or is not 
black (Æ <r). For reasons of symmetry the probability that X, = 1 
is b/(b + g), and hence 


b bg 

(5.12) E(X;,) = ——, Var(X;) = . 
b +g ET 

Next, if j = k, then X;X;, = 1 if the jth and kth elements of the sample 

are black, and otherwise X;X, = 0. The probability of X;X, = 1 is 

b(b — 1)/(b + g)(b + g — 1), and therefore 


b(b — 1) 
(5.13) EIO e eee 
p b+96+g—-1 
Cov(X;X,) = zi 


O+b+o—-1 
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Thus, 


CW re-s vej- X fı rad | 
a =— ar = ——— — ———__;* 
" b+9 “+9? b+g-1 
In sampling with replacement we would have the same mean, but the 
variance would be slightly larger, namely, rbg/(6 + g): 


6. CHEBYSHEV’S INEQUALITY § 


It has been pointed out that a small variance indicates that large 
deviations from the mean are improbable. This statement is made 
more precise by Chebyshev’s inequality, which is an exceedingly useful 
and handy tool. 


Theorem. Let X be a random variable with mean p = E(X) and 
variance o? = Var(X). Then for any t > 0 


o 
(6.1) P{|X — u| > t} sE 


Proof. The variance is defined in (4.3) by a series with positive 
terms. Delete all terms for which |z; — p| < ¢; this cannot increase 
the value of the series, and hence 


(6.2) Ê > 2*(2; — p)*f(a)) 


where the star indicates that the summation extends only over those j 
for which |z; — »|>¢. It is then clear that 


(6.3) E* (z; — )*f(aj) > PZ*Y@) = PP{|X — el > t} 


which proves the theorem. 

Chebyshev’s inequality must be regarded as a theoretical tool rather 
than a practical method of estimation. Its importance is due to its 
universality, but no statement of great generality can be expected to 
yield sharp results in individual cases. 


Examples. (a) If X is the number scored in a throw of a true die, 
then [cf. example (4.b)], u = $, o = 323. The maximum deviation of 
X from p is 2.5 ~ 30/2. The probability of greater deviations is zero, 
whereas Chebyshev’s inequality only asserts that this probability is 
smaller than 0.47. 

(b) For the binomial distribution {b(k; n, p)} we have [ef. example 
(5.a)] u = np, o = npg. For large n we know that 


(6.4) P{|S, — np| > x(npq)!} ~ 1 — &@) + &(—-2). 


5P, L. Chebyshev (1821-1894). 
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Chebyshev’s inequality states only that the left side is less than 1/z°; 
this is obviously a much poorer estimate than (6.4). 
*7. KOLMOGOROV’S INEQUALITY * 
As an example of more refined methods we prove: 


Let X1, ..., Xn be mutually independent variables with expectations 
ur = E(X;) and variances o. Put 


(7.1) S = X, +...+ X: 
and 
(7.2) my = E(Sz) = p +...+ we, 


sy? = Var(S;z) = o? +...4+ o. 
For every t > 0 the probability of the simultaneous realization of the n 
inequalities 
(7.3) [Sk — m| < tsn k=1,2,...,0 
is at least 1 — t. 


For n = 1 this theorem reduces to Chebyshev’s inequality. For 
n > 1 Chebyshev’s inequality gives the same bound for the probability 
of the single relation |S, — m,| < tsn, so that Kolmogorov’s inequality 
is considerably stronger. 


Proof. We want to estimate the probability z that at least one of 
the inequalities (7.3) does not hold. The theorem asserts that z < (72. 
Define n random variables Y, as follows: Y, = 1 if z 


(7.4) IS, = m,| > tsn 
but 
(7.5) |Sz — m| < tsn for k= 1, 2, ...; 7—1; 


Y, = 0 for all other sample 
points in which the vth of the 
Then at any particular sam 
and the sum Yı + Y; +.. 


points. In words, Y, equals 1 at those 
inequalities (7.3) is the first to be violated. 
ple point at most one among the Y; is 1, 
-+ Yn can assume only the values 0 or 1; 


* This section treats a special topic and should be omitted at first reading. 


6 Uber die Summen zufälliger Grössen, Mathematische nalen, 
pp. 309-319, and vol. 102 (1929), pp. 484—488. hi pe 
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it is 1 if, and only if, at least one of the inequalities (7.3) is violated, 
and therefore 

(7.6) z=P{Y, +..-+ Yp = 1}. 


Since Yı +...+ Yn is 0 or 1, we have ZY; < 1. Multiplying by 
(Sn — Mn)? and taking expectations, we get 


(7.7) È E(¥e(Sa — mn)?) < spè. 
k=l 

For an evaluation of the terms on the left we put 
(7.8) Ur = (Sn — M) — (Si - m) = È (X, — m). 

»=k+1 
Then 
(7.9) EY:(Sn — mn)?) = E(Yk(Sr — mz)?) + 

+ 2E(¥,U;.(S; — mz)) + E(¥;0;2). 


However, Ux depends only on X;41, ..., Xn while Y, and S, depend 
only on X;, ..., X. Hence U+ is independent of Y;,(S, — m,) and 
therefore E(¥,Uz(S; — m,)) = E(¥:(S. — m))E(U;) = 0, since 
E(;) = 0. Thus from (7.9) 

(7.10) E(¥i(Sn — mn)”) > E(Y}(Sr — mz)?). 

But Y Æ 0 only if |S; — m,|> tsn, so that Y¥.(S, — my)? > Psn Yr. 
Hence, combining (7.7) and (7.10), we get 

(7.11) 8n? > Psn EY +...+ Yn). ` 


Since Y, +...-+ Yn equals either 0 or 1, the expectation to the right 
equals the probability x defined in (7.6). Thus zi” < 1 as asserted. 


*8. THE CORRELATION COEFFICIENT 


Let X and Y be any two random variables with means uz and py 
and positive variances oz” and o,”. We introduce the corresponding 
normalized variables X* and Y* defined by (4.6). Their covariance is 
called the correlation coefficient of X, Y and is denoted by p(X, Y). Thus, 


using (5.4), aie 
(8.1) o(X, Y) = Cov(X*, Y*) = aD, 
zTy 


* This section treats a special topic and may be omitted at first reading. 
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Clearly this correlation coefficient is independent of the origins and 
units of measurements, that is, for any constants a1, a2, bı, be, with 
a; > 0, a2 > 0, we have p(aiX + bi, @2¥ + b2) = p(X, Y). 

The use of the correlation coefficient amounts to a fancy way of 
writing the covariance.’ Unfortunately, the term correlation is sugges- 
tive of implications which are not inherent in it. We know from section 
5 that p(X, Y) = 0 whenever X and Y are independent. It is important 
to realize that the converse is not true. In fact, the correlation coefficient 
p(X, Y) can vanish even if Y is a function of X. 


Examples. (a) Let X assume the values +1, +2 each with prob- 
ability 2. Let Y = X?. The joint distribution is given by p(—1, 1) = 
= p(l, 1) = p(2, 4) = p(—2, 4) = 34. For reasons of symmetry 
p(X, Y) = 0 even though we have a direct functional dependence of 
Y on X. 

(b) Let U and V be independent variables with the same distribution, 
andlet X = U +V, Y =U -—V. Then E(XY) = E(U’) — E(v?) =0 
and E(Y) = 0. Hence Cov(X, Y) = 0 and therefore also p(X, Y) = 0. 
For example, X and Y may be the sum and difference of points on two 
dice. Then X and Y are either both odd or both even and therefore 
dependent. 


It follows that the correlation coefficient is by no means a general 
measure of dependence between X and Y. However, p(X, Y) is con- 
nected with the linear dependence of X and Y. 


Theorem. We have always | p(X, Y)|< 1; furthermore, p(X, Y) = 
= +1 only if there exist constants a and b such that Y = aX + b, except, 
perhaps, for values of X with zero probability. 


Proof. Let X* and Y* be the normalized variables. Then 
(8.2) Var(X* + Y*) = Var(X*) + 2 Cov(X*, Y*) + Var(¥*) = 
= 2(1 + p(X, ¥)). 


The left side cannot be negative; hence | p(X, Y)| < 1. For p(X, Y) = 1 
it is necessary that Var(X* — Y*) = 0 which means that with unit 
probability the variable X* — Y* assumes only one value. In this case 
X* — Y* = const., and hence Y = aX + const. with a = y/o A 
similar argument applies to the case p(X, Y) = —1. 


7The physici $ i i i 
p ysicist would define the correlation coefficient as “dimensionless co- 
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9. PROBLEMS FOR SOLUTION 


1. Seven balls are distributed randomly in seven cells. Let X; be the num- 
ber of cells containing exactly 7 balls. Using the probabilities tabulated in 
chapter II, section 5, write down the joint distribution of (Ke, Xs). 

2. Two ideal dice are thrown. Let X be the score on the first die and Y 
be the larger of two scores. (a) Write down the joint distribution of X and Y. 
(b) Find the means, the variances, and the covariance. 

3. In five tosses of a coin let X, Y, Z be, respectively, the number of heads, 
the number of head runs, the length of the largest head run. Tabulate the 32 
sample points together with the corresponding values of X, Y, and Z. By 
simple counting derive the joint distributions of the pairs (X, Y), (X, Z), (Y, Z) 
and the distributions of X + Y and XY. Find the means, variances, covari- 
ances of the variables. 

4. The random variables X, and X, are independent and have the same 
geometric distribution {q*p}, where k = 0,1, .... Let Z be defined as the larger 
of X, and X; [in symbols, Z = max (Xj, X2)]. Derive the joint distribution 
of Z and Xj, and the distribution of Z. 

5. Let X, and X: be independent random variables with Poisson distribu- 
tions {p(%;A1)} and {p(k;à2)}. Prove that X; + X: has the Poisson distribu- 
tion { p(k; A1+A2)}. 

6. Continuation. Show that the conditional distribution of X, given X, + X: 
is binomial, namely 


k nd Tie, NAN 
(9.1) P(X, = k|Xi +X: = n} sbhin) 


7. Let X; and X, be independent and have the common geometric distribu- 
tion {¢*p} (as in problem 4). Show without calculations that the conditional 
distribution of X, given Xı + X2 is uniform, that is, 

(9.2) P{X, = k|X; +X =a] -= PAG oat 

8. Let X;, ..., X, be mutually independent random variables, each having 
the uniform distribution P{X; = k} = 1/N for k = 1, 2, ..., N. Let Un be 
the smallest among the X;, ..., Xn and V, the largest. Find the distributions of 
Un and V,. What is the connection with the estimation problem (3.e)? 

9. In the estimation problem (3.e) find the joint distribution of the largest 
and the smallest observation. Specialize to n = 2. (Hint: Calculate first 
P{X<7,Y>s},) 

10. Continuation. Find the conditional probability that the first two ob- 
servations are j and k, given that X = r. 

` Ù. Continuation. Find E(X?) and hence an asymptotic expression for 
Var(X) as N — œ (with n fixed). 

12. Sampling inspection. Suppose that items with a probability p of being 
defective are subjected to inspection in such a way that the probability of an 
item being inspected is p’. We have four classes, namely, “acceptable and 
inspected,” “acceptable but not inspected,” etc. with corresponding proba- 
bilities pp’, pq’, p'q, a’ where q = 1 — p,q’ = 1 — p’. Weare concerned with 
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double Bernoulli trials [see example VI(9.c)]. Let N be the number of items 
passing the inspection desk (both inspected and uninspected) before the first 
defective is found, and let K be the (undiscovered) number of defectives among 
them. Find the joint distributions of N and K and the marginal distributions. 


13. Continuation. Find E Ls ) and Cov(K,N). [In industrial prac- 


tice the discovered defective item is replaced by an acceptable one so that 
E/(N + 1) is the fraction of defectives and measures the quality of the lot. 


Note that E Es 3) is not E(K)/E(N + 1)] 


14. In a sequence of Bernoulli trials let X be the length of the run (of either 
successes or failures) started by the first trial. Find the distribution of X, 
E(X), Var(X). 

15. Continuation. Let Y be the length of the second run. Find the distribu- 
tion of Y, E(Y), Var(Y), and the joint distribution of X, Y. 


16. If two random variables X and Y assume only two values each, and if 
Cov(X, Y) = 0, then X and ¥ are independent. 


17. Birthdays. For a group of n people find the expected number of days 
of the year which are birthdays of exactly k people. (Assume 365 days and 
that all arrangements are equally probable.) 


18. Continuation. Find the expected number of multiple birthdays. How 
large should n be to make this expectation exceed 1? 


19. A man with n keys wants to open his door and tries the keys independ- 
ently and at random. Find the mean and variance of the number of trials 
(a) if unsuccessful keys are not eliminated from further selections; (b) if they 
are. (Assume that only one key fits the door. The exact distributions are 
given in chapter II, section 7, but are not required for the present problem.) 

20. Let (X, Y) be random variables whose joint distribution is the trinomial 
defined by (1.9). Find E(X), Var(X), and Cov(X, Y) (a) by direct computa- 
tion, (b) by representing X and Y as sums of n variables each and using the 
methods of section 5. 


21. Find the covariance of the number of ones and sixes in n throws of a 
die. 


22. In the animal trapping problem VI, 24 prove that the expected number 
of animals trapped at the vth trapping is ngp’—!. 

23. If X has the geometric distribution P{K = k} = qp (where k = 0, 1, 
...), Show that Var(X) = gp~*. Conclude that the negative binomial distribu- 
tion { f(t; 7, p)} has variance rqp~* provided r is a positive integer. Prove by 
direct calculation that the statement remains true for all r > 0. 

24, In the waiting time problem (3.d) prove that 


1 2 r-i 
vae) =N atat ea 


Hini. Use the variance of the geometric distribution obtained in problem 
23. Incidentally, as Ñ — we find N—? Var(S,) = 72/6, 


25. Continuation. Let Y, be the number of drawings required to include 
r preassigned elements (instead of any r different elements as in the text). 


ait 
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Find E(Y,) and Var(¥,). (Note: The exact distribution of Y, was found in 
problem II(11.12) but is not required for the present purpose.) 

26.8 A large number, N, of people are subject to a blood test. This can be 
administered in two ways. (i) Each person can be tested separately. In this 
case N tests are required. (ii) The blood samples of k people can be pooled 
and analyzed together. If the test is negative, this one test suffices for the k 
people. If the test is positive, each of the k persons must be tested separately, 
and in all k + 1 tests are required for the k people. 

Assume the probability p that the test is positive is the same for all people 
and that people are stochastically independent. 

(a) What is the probability that the test for a pooled sample of k people will 
be positive? 

(6) What is the expected value of the number, X, of tests necessary under 
plan (ii)? 

(c) Which k will minimize the expected number of tests under plan (ii)? 
Do not try numerical evaluations, since the problem leads to a rather cumber- 
some equation for k. 


the proportion p:p2:...:p,. A random sample of size n is taken with replace- 
ment. Find the expected number of classes not represented in the sample. 

28. Let X be the number of æ runs in a random arrangement of rı alphas 
and rz betas. The distribution of X is given in problem TI(11.23). Find E(X) 
and Var(X). 

29. In Polya’s urn scheme [V(2.c)] let X, be one or zero according as the nth 
trial results in black or red. Prove P(En, Xm) = c/(b + r + c) for n Xm. 

30. Continuation. Let S, be the total number of black balls extracted in 
the first n drawings (that is, S, = X, +.. -+ Xn). Find E(S,) and Var(S,). 
(Use problems V, 19 and V, 20; verify the result by means of the recursion 
formula of problem V, 22.) 

31. Stratified sampling. A city has n blocks of which n; have 2; inhab- 
itants each (mı + m2+...=7). Let m= Znjx;/n be the mean number of 
inhabitants per block and put a? = 2n;x;?/n — m?. In sampling without re- 
placement r blocks are selected at random, and in each the inhabitants are 
counted. Let X;, ..., X, be the respective number of inhabitants. Show that 


2 n 
E pa Fem Vae AE Sen, 
(In sampling with replacement the variance would be larger, namely, a?r.) 
32. Length of random chains.? A chain in the z, y-plane consists of n links, 
each of unit length. The angle between two consecutive links is +æ where « 
is a positive constant; each possibility has probability 4, and the successive 


8 This problem is based on a new technique developed during World War II. 
See R. Dorfman, The detection of defective members of large populations, Annals 
of Mathematical Statistics, vol. 14 (1943), pp. 436 440. In army practice, plan (ii) 
introduced up to 80 per cent savings. 

® This is the two-dimensional analogue to the problem of length of long polymer 
molecules in chemistry. The problem illustrates applications to random variables 
which are not expressible as sums of simple variables. 
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angles are mutually independent. The distance Ln from the beginning to the 
end of the chain is a random variable, and we wish to prove that 
1+ cosa _ 1 — costa 


1 — cose sco er = cosa)? 


(9.3) EL) =n 


Without loss of generality the first link may be assumed to lie in the direc- 
tion of the positive z-axis. The angle between the kth link and the positive 
z-axis is a random variable S+—ı where So = 0, S = Sy-1 + Xa and the X, 
are mutually independent variables, assuming the values +1 with probability 
4. The projections on the two axes of the kth link are cos S,_; and sin S,_1. 
Hence forn > 1 


n=l 2 n=l 2 
(9.4) Les (Zoos S:) + (È sin s.) y 
Prove by induction successively for m < n 
(9.5) E(cos S,) = cos” a, E(sinS,) = 0; 
(9.6) E((cos Sm)-(coS Sn)) = cos"—” æ- E(cos? Sm) 
(9.7) E((sin S,,)-(sin S,)) = cos"—” a-E(sin? Sm) 
(98) EL.) — BLY.) = 1+ 2eosa- +08" a 


(with Lo = 0) and hence finally (9.3). 


33. A sequence of Bernoulli trials is continued as long as necessary to obtain 
r successes, where 7 is a fixed integer. Let X be the number of trials required. 
Find * E(r/X). (The definition leads to infinite series for which a finite ex- 
pression can be obtained.) 

34. In a random placement of r balls into n cells the probability of finding 
exactly m cells empty satisfies the recursion formula II(11.8). Let m, be the 
expected number of empty cells. From the recursion formula prove that 


Mr41=1+(1—n—)m,, and conclude m, = alı — (1 = ayi. 


35. Let S, be the number of successes in n Bernoulli trials. Prove 
E(|S, — np|) = 2vgb(v; n, p) 


where v is the integer such that np < v» < np + 1. 


36. Let {X}} be a sequence of mutually independent random variables with 
a common distribution. Suppose that the X, assume only positive values and 


1° This example illustrates the effect of optional stopping. If the number n of 
trials is fixed, the ratio of the number N of successes to the number n of trials is 
a random variable whose expectation is p. It is often erroneously assumed that 
the same is true in our example where the number r of successes is fixed and the 
number of trials depends on chance. If p = % and 7 = 2, then E(2/X) = 0.614 
instead of 0.5; for r = 3 we find E(8/X) = 0.579. 


| 
f 
| 
j 
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that E(X;) = a and E(X,~') = b exist. Let S, = X, +:..+ Xn. Prove that 
E(S,~') is finite and that E(X;/S,) = 1/n for k = 1, 2, .. Br chy 

37. Continuation. Prove that 


z (=) =%, ifm<n 
2 (=) St Gi —Wak(S), iS 


38. Let X;, ..., X, be mutually independent random variables with a com- 
mon distribution; let its mean be m, its varianceo*. Let ¥ = (Xi-+...4+ X,)/n. 
Prove that * 


1 ud = 
ox J) seks 
pai (È œ- I) -0 
39. Let Xi, ..., Xn be mutually independent random variables. Let U be 
a function of Xi, ..., X and V a function of X41, -.-, Xn (k < n). Prove 


that U and V are mutually independent random variables. 


40. Generalized Chebyshev inequality. Let (z) > 0 for z > 0 be monotoni- 
cally increasing and suppose that E(¢(|X|)) = M exists. Prove that 
M 
PIXI <A. 
o() 
41. Schwarz inequality. For any two random variables with finite variances 
one has E?(XY) < E(K*)E(¥’). Prove this from the fact that the quadratic 
polynomial E((tX + Y)*) is non-negative. 


The observation that 37 can be proved by introducing 36 is due to K. L. Chung. 
12 This can be expressed by saying that D(X, — X)?/(n — 1) is an unbiased esti- 
mator of o°, 


CHAPTER X 


Laws of Large Numbers 


1. IDENTICALLY DISTRIBUTED VARIABLES 


The limit theorems for Bernoulli trials derived in chapters VII and 
VIII are special cases of general limit theorems which cannot be treated 
in this volume. However, we shall here discuss at least some cases of 
the law of large numbers in order to reveal a new aspect of the expecta- 
tion of a random variable. 

The connection between Bernoulli trials and the theory of random 
variables becomes clearer when we consider the dependence of the 
number S, of successes on the number n of trials. With each trial S,, 
increases by 1 or 0, and we can write 


(1.1) SPERE x, 


where the random variable X;, equals 1 if the kth trial results in success 
and zero otherwise. Thus S, is a sum of n mutually independent ran- 
dom variables, each of which assumes the values 1 and 0 with prob- 
abilities p and g. From this it is only one step to consider sums of the 
form (1.1) where the X, are mutually independent variables with an 
arbitrary distribution. The (weak) law of large numbers of chapter 
VII, section 4, states that for large n the average proportion of suc- 
cesses S,,/n is likely to lie near p. This is a special case of the following 


Law of Large Numbers. Let {Xj} be a sequence of mutually inde- 
pendent random variables with a common distribution. If the expecta- 
tion u = E(X,) exists, then for every e > Oasn > œ 
Xit+...+X, 

n 


mat 


(1.2) R l 


>e} > 0; 


in words, the probability that the average S,/n will differ from the 
expectation by less than an arbitrarily prescribed e tends to one. 
228 
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In this generality the theorem was first proved by Khintchine.! 
Older proofs had to introduce the unnecessary restriction that the vari- 
ance Var(X;) should also be finite.? For this case, however, there exists 
a much more precise result which generalizes the DeMoivre-Laplace 
limit theorem for Bernoulli trials, namely the 


Central Limit Theorem. Let {X+} be a sequence of mutually inde- 
pendent random variables with a common distribution. Suppose that 
u = E(X;) and o° = Var(X;) exist and let Sn = X, +...+ Xn. Then 
Jor every fixed B 


(1.3) P [E < s} > &(6) 


where &(z) is the normal distribution introduced in chapter VII, sec- 
tion 1. This theorem is due to Lindeberg; * Ljapunov and other authors 
had previously proved it under more restrictive conditions. It must be 
understood that this theorem is only a special case of a much more 
general theorem whose formulation and proof are deferred to the sec- 
ond volume. Here we note that (1.3) is stronger than (1.2), since it 


$ sis : 1 
gives an estimate for the probability that the discrepancy | = Sp — u 


is larger than g/n?. On the other hand, the law of large numbers (1.2) 
olds even when the random variables X% have no finite variance so 
that it is more general than the central limit theorem. For this reason 
we shall give an independent proof of the law of large numbers, but 
first we illustrate the two limit theorems. 


Examples. (a) In a sequence of independent throws of a symmetric 
die let X} be the number scored at the kth throw. Then E(X;) = a++ 
+2+3+4+5-+46)/6 = 3.5, and Var(X) = (1? + 2+3? + 
+ 47 + 5? + 67)/6 — (3.5)? = 24. The law of large numbers states 
that for large n the average score S,,/n is likely to be near 3.5. The 
central limit theorem states that the probability of |S, — 3.5n| < 
< a: (35n/12)3 is about (a) — #(—a). Forn = 1000 and a = 1 we 
find that there is roughly probability 0.68 that 3450 <S, < 3550. 
Choosing for œ the median value a = 0.6744, we find that there are 


1 A. Khintchine, Sur la loi des grands nombres, Comptes rendus de l’ Académie des 
Sciences, vol. 189 (1929), pp. 477-479. Incidentally, the reader should observe 
the warning given in connection with the law of large numbers for Bernoulli trials 
at the end of chapter VI, section 4, 

2 A, Markov showed that the existence of E(|Xx|!+*) for some a > 0 suffices, 

3J. W. Lindeberg, Eine neue Herleitung des Exponentialgesetzes in der Wahr- 
scheinlichkeitsrechnung, Mathematische Zeitschrift, vol. 15 (1922), pp. 211-225. 
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roughly equal chances that S, lies within or without the interval 
3500 + 36. 

(b) Sampling. Suppose that in a population of N families there are 
N; families with exactly k children (k = 0, 1, ...; =N;=N). Fora 
family chosen at random, the number of children is a random variable 
which assumes the value v with probability p, = N,/N. A sample of 
size n with replacement represents n independent random variables or 
“observations” X1, ..-., Xn, each with the same distribution; S,/n is 
the sample average. The law of large numbers tells us that for suffi- 
ciently Jarge random samples the sample average is likely to be near 
u = 2yp, = ZyN,/N, namely the population average. The central 
limit theorem permits us to estimate the probable magnitude of the 
discrepancy and to determine the sample size necessary for reliable 
estimates. In practice both » and o? are unknown. However, it is 
usually easy to obtain a preliminary estimate of o”, and it is always 
possible to keep to the safe side. If it is desired that there be prob- 
ability 0.99 or better that the sample average Sn/n differ from the un- 


known population mean 4 by less than 345, then the sample size should 
be such that 


(1.4) P { 


Snan — ny 


< al > 0.99. 

n 10 

The root of (x) — 6(—zx) = 0.99 is z = 2.57..., and hence n should 
satisfy n}/10c > 2.57 or n > 6600”. A cautious preliminary estimate 
of o” gives us an idea of the required sample size. Similar situations 
occur frequently. Thus when the experimenter takes the mean of n 
measurements he, too, relies on the law of large numbers and uses a 
sample mean as an estimate for an unknown theoretical expectation. 
The reliability of this estimate ean be judged only in terms of o”, and 
usually we are compelled to use rather crude estimates for o°. 

(c) The Poisson distribution. In chapter VII, section 4, we found 
that for large the Poisson distribution {p(k; \)} can be approximated 
by the normal distribution. This is really a direct consequence of the 
central limit theorem. Suppose that the variables X; have a Poisson 
distribution {p(k; y)}. Then S, has a Poisson distribution {p(k; ny)} 
with mean and variance equal to ny. Writing A for ny, we conclude 
that asn — © 


(1.5) Dd eo a*/kl = 4(8) 


E<AFEN 


the summation extending over all k up to A + gd}. It is now obvious 
that (1.5) holds also when à approaches œ in an arbitrary manner. 


aw 


Ss 
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This theorem is used in the theory of summability of divergent series 
and is of general interest; estimates of the difference of the two sides 
in (1.5) are available from the general theory. - 


Note on Variables without Expectation 


Both the law of large numbers and the central limit theorem become 
meaningless if the expectation » does not exist, but they can be re- 
placed by more general theorems supplying the same sort of informa- 
tion. In the modern theory variables without expectation play an im- 
portant role and many waiting and recurrence times in physies turn out 
to be of this type. This is true even of the simple coin-tossing game. 

Suppose that n coins are tossed one by one. For the kth coin let X+ 
be the waiting time up to the first equalization of the accumulated 
numbers of heads and tails. The X+ are mutually independent random 
variables with a common distribution: each X+ assumes only even 
positive values and P{X; = 2r} = fe, with the probability distribution 
{for} defined in TIT(4.2). According to theorem 3 of chapter III, sec- 
tion 4, the distribution of the sum S, = Xi +... + Xn is given by 


(1.6) P{S, = 2r} =f? 


with f@ defined in III(4.11). In chapter III, section 8(c), it was 
shown that asn —> © 


(1.7) P{S, < nr} — 2[1 — Sm). 


We have here a limit theorem of the same character as the central limit 
theorem with the remarkable difference that this time tke variable S,/n”, 
rather than S,,/n, possesses a limiting distribution. 

In physical language the X, represent independent measurements on 
the same quantity, and the limit theorem asserts that, in probability, 
the average S,/n increases linearly with n. The surprising conse- 
quences of this behavior were discussed in chapter III.‘ 


*2, PROOF OF THE LAW OF LARG NUMBERS 

We proceed in two steps. First assume that o? = Var(X;) exists 
and note that here Var(S,) = no”, by the addition rule IX(5.6). 
According to the Chebyshev inequality 1X(6.1), we have for every 
t>0 5 

no 

(2.1) P{|Sn — mul > t) Sa 
de A 
gue to the law of large numbers in a case of variables without finite 


on 4 and problem 13. l i 
ts a special topic and may be omitted at first reading. 


4 For an analo 
expectation see secti 
* This section trea! 
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For t > en the left side is less than o”/e’n, which tends to zero. This 
accomplishes the proof. 

Next we drop the restriction that Var(X;) exists. This case is re- 
duced to the preceding one by the method of truncation which is an im- 
portant standard tool. Define two new collections of random variables 
depending on the X;, as follows: 


Ur = X, Vi=0 if |X| < en; 
mee Ve i Xen 


(2.2) 


Here k = 1, ..., n and e > 0 is fixed. Then identically 
(2.3) X; = Ur + Vz. 


If {f(z;)} is the common probability distribution of the variables Xy, 
the sum 


(2.4) 2|z;|f(@) =A 

is finite since » = E(X;) was assumed to exist. Now 

(2.5) Hn = EU) = D> rfl), 
lzi | Sen 


the summation extending over those j for which |z;| < en. Clearly 


Wn —> wasn — œ, and hence for all n sufficiently large and for arbi- 
trary ô > 0 


(2.6) [w'n — u| < 8, 
Furthermore, from (2.5) and (2.4), 
(2.7) Var(U;) < E(U;?) < on Z lle) < edn, 


The U; are mutually independent, and their sum U: +U: +...+ Un 
can be treated exactly as the X; in the case of finite variances; applying 
the Chebyshev inequality, we get the following analogue to (2.1) 


Ui +...+ Un Var(U;,) «A 
ea e] Saa ee 


CS. P l 


In view of (2.6) this implies 


(2.9) p {|= +...40, 


mn TH 


> 2) eth 
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Note that there is a large probability that V = 0. In fact 
il 
(2.10) PV. #0} = D fe) <— Do Izle), 
|zj|>en EN |zj|>en 


and the last sum tends to 0 with increasing n. Therefore for n suffi- 
ciently large 
(2.11) PIV; ~0} < É 

n 


and hence by the basic inequality (7.6) 

(2.12) P{V, +...+V, #0} S€. 

Now S, = (U, +...+ Un) + Vi +.-.+V,), and therefore from 
(2.9) and (2.12) 


Ui H... + Un 
PA eSF > 267 + 


>2| <P{ 


Sn 
(2.13) P| F E 


«A 
HPV +t Va = 0} Sagte 


Since e and ô are arbitrary, the right side can be made arbitrarily 
small, and this proves the assertion. 


3. THE THEORY OF “FAIR” GAMES 


For a further analysis of the implications of the law of large numbers 
we shall use the time-honored terminology of gamblers, but our dis- 
cussion bears equally on less frivolous applications, and our two basic 
assumptions are more realistic in statistics and physics than in gambling 
halls. First, we shall assume that our gambler possesses an unlimited 
capital so that no loss can force a termination of the game. (Dropping 
this assumption leads to the problem of the gambler’s ruin, which 
from the very beginning has intrigued students of probability. It is 
of importance in Wald’s sequential analysis and in the theory of sto- 
chastic processes, and will be taken up in chapter XIV.) Second, we 
shall assume that the gambler does not have the privilege of optional 
stopping; the number n of trials must be fixed in advance independently 
of the development of the game. In reality a player blessed with an 
unlimited capital would wait for a run of good luck and quit at an oppor- 
tune moment. He is not interested in the probable siate at a pre- 
scribed moment, but only in the maximal fluctuations in the long run. 
Light is shed on this problem by the law of the iterated logarithm 
rather than by the law of large numbers (cf. chapter VIII, section 5). 
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The random variable X+ will be interpreted as the (positive or nega- 
tive) gain at the kth trial of a player who keeps playing the same type 
of game of chance. The sum S, = X, +...+ X, is the accumulated 
gain in n independent trials. If the player pays for each trial an en- 
trance fee p’ (not necessarily positive), then ny’ represents the accu- 
mulated entrance fees, and S$, — ny’ the accumulated net gain. The 
law of large numbers applies when » = E(X;) exists. It says roughly 
that for sufficiently large n the difference S, — nu is likely to be small 
in comparison to n. Therefore, if the entrance fee w’ is smaller than p, 
then, for large n, the player is likely to have a positive gain of the 
order of magnitude n(x — p’). For the same reason an entrance fee 
p’ > wis practically sure to lead to a loss. In short, the case p! < p 
is favorable to the player, while x’ > p is unfavorable. 

Note that nothing is said about the case u' =u. The only possible 
conclusion in this case is that, for n sufficiently large, the accumulated 
gain or loss S, — np will with overwhelming probability be small in com- 
parison with n. It is not siated whether Sa — np is likely to be posi- 
tive or negative, that is, whether the game is favorable or unfavorable. 
This was overlooked in the classical theory which called w’ = na 
“fair” price and a game with p’ = p “fair.” Much harm was done by 
the misleading suggestive power of this name. It must be understood 
that a “fair” game may be distinctly favorable or unfavorable to the 
player. 

It is clear that “normally” not only E(X;) but also Var(X;) exists. 
In this case the law of large numbers is supplemented by the central 
limit theorem, and the latter tells us that, with a “fair” game, the 
long-run net gain S, — ny is likely to be of the order of magnitude nt 
and that for large n there are about equal odds for this net gain to be 
positive or negative. Thus, when the central limit theorem applies, 
the term “fair” appears Justified, but even in this case we deal with a 
limit theorem with emphasis on the words “long run.” 

For illustration, consider a slot machine where the player has a prob- 
ability of 10-° to win 10°—1 dollars, and the alternative of losing the 
entrance fee u’ = 1. Here we have Bernoulli trials, and the game is 
“fair.” In a million trials the player pays as many do'lars in entrance 
fees. He may hit the jackpot 0, 1, 2, ... times. We know from the 
Poisson approximation to the binomial distribution that, with an accu- 
racy to several decimal places, the probability of hitting the jackpot 
exactly k times is e/k!. Thus the player has probability 0.368... 
to lose a million, and the same probability of barely recovering his 
expenses; he has probability 0,184... to gain exactly one million, ete. 
Here 10° trials are equivalent to one single trial in a game with the 


| 
| 
| 
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gain distributed according to a Poisson distribution (which could be 
realized by matching two large decks of cards; cf. chapter IV, section 4). 
Obviously the law of large numbers is operationally meaningless in such 
situations. Now all fire, automobile, and similar insurance is of the 
described type; the risk involves a huge sum, but the corresponding 
probability is very small. Moreover, the insured plays ordinarily only 
one trial per year, so that the number n of trials never grows large. 
For him the game is necessarily “unfair,” and’yet it is usually econom- 
ically advantageous; the law of large numbers is of no relevance to him. 
As for the company, it plays a large number of games, but because of 
the large variance the chance fluctuations are pronounced. The pre- 
miums must be fixed so as to preclude a huge loss in any specific year, 
and hence the company is concerned. with the ruin problem rather 
than the law of large numbers. 

When the variance is infinite, the term “fair game” becomes an 
absolute misnomer; there is no reason to believe that the accumulated 
net gainS, — np" fluctuates around zero. In fact, there exist examples 
of “fair” games ® where the probability tends to one that the player 
will have sustained a net loss. The law of large numbers asserts that 
this net loss is likely to be of smaller order of magnitude than n. How- 
ever, nothing more can be asserted. If a, is an arbitrary sequence such 
that a,/n —'0, it is possible to construct a “fair” game where the 
probability tends to one that at the nth trial the accumulated net loss 
exceeds an. Problem 15 contains an example where the player has a 
practical assurance that his loss will exceed n/log n. This game is 
“fair,” and the entrance fee is unity. It is difficult to imagine that a 
player will find it “fair” if he is practically sure to sustain a steadily 
increasing loss. 


*4, THE PETERSBURG GAME 


In the classical theory the notion of expectation was not clearly dis- 
associated from the definition of probability, and no mathematical 
formalism existed to handle it. Random variables with infinite expecta- 
tions therefore produced insurmountable difficulties, and even quite 
recent discussions appear strange to the student of modern probability. 
The importance of variables without expectation has been stressed at 
the conclusion of section 1, and it seems appropriate here to give an 
example for the analogue of the law of large numbers in the case of 


he law of large numbers and “fair” games, Annals of 


1. 16 (1945), pp. 301-304. 
and may be omitted at first reading. 


s W, Feller, Note on t 


Mathematical Statistics, vo j } 
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such variables. For that purpose we use the time-honored so-called 
Petersburg paradox.’ 

A single trial in the Petersburg game consists in tossing a true coin 
until it falls heads; if this occurs at the rth throw the player receives 2” 
dollars. In other words, the gain at each triai is a random variable 
assuming the values 2', 27, 23, ... with corresponding probabilities 
27, 27, 2-3, .... The expectation is formally defined by Erf (ar) 
with z, = 2’ and f(z,) = 277, so that each term of the series equals 1. 
Thus the gain has no finite expectation, and the law of large numbers 
is inapplicable. Now the game becomes less favorable to the player 
when amended by the rule that he receives nothing if a trial takes more 
than N tosses (i.e., if the coin falls tails N times in succession). In 
this amended game the gain has the finite expectation N , and the law 
of large numbers applies. It follows that after n trials the accumulated 
gain is likely to exceed nN for every N. The player can therefore ex- 
pect to have a net profit even if he pays an arbitrary fixed entrance fee 
w for each trial. This is true for every u’, but the larger y’, the larger 
must 7 be in order that a positive gain be probable. The classical 
theory concluded that 4’ = © is a “fair” entrance fee, but the modern 
student will hardly understand the mysterious discussions of this 
“paradox.” 

It is perfectly possible to determine entrance fees with which the 
Petersburg game will have all properties of a “fair” game in the alassical 
sense, except that these entrance fees will depend on the number of 
trials instead of remaining constant. Variable entrance fees are un- 
desirable in gambling halls, but there the Petersburg game is impossible 
anyway because of limited resources. In the case of a finite expectation 
e = E(X;) > 0, a game is called “fair” if for large n the ratio of the 
accumulated gain S, to the accumulated entrance fees en = ny’ is likely 
to be near 1 (that is, if the difference Sn — én is likely to be of smaller 
order of magnitude than e, = ny’). If E(X;) does not exist, we cannot 
put en = nu’ but must determine en in another way. We shall say 
that a game with accumulated entrance Jees en is fair in the classical 
sense if for every e > 0 


(4.1) P { 


S, 
*=i/>d — 0. 
en 


This is the complete analogue of the law of large numbers where 
En = nu’. The latter is interpreted by the physicist to the effect that 


Thie paradox was discussed by Daniel Bernoulli (1700-1782). Note that 
Bernoulli trials are named after James Bernoulli, 


= 
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the average of n independent measurements is bound to be near p. 
In the present instance the average of n measurements is bound to be 
near é,/n. Our limit theorem (4.1), when it applies, has a mathematical 
and operational meaning which does not differ from the law of large 
numbers. 

We shall now show’ that the Petersburg game becomes “fair” in the 
classical sense if we put en = n Log n, where Log n is the logarithm to 
the base 2, that is, n — n, 


Proof. We use the method of truncation of section 2, this time de- 
fining the variables U;, and Vg (k = 1, 2, ..., n) by 
U; = Xx, Vi; =0 if X, < n Logn; 
(4.2) 3 
U, = 0, Vi. = X; if X, > n Logn. 
Again X = Ux + Vi, and the U; are mutually independent. For every 
t we have P{X;, > t} < 2/t and hence P{V; # 0} < 2/(n Log n), or 
2 
(4.3) P{V; + Vo +...+ Vn > 0} < — > 0 
Logn 


To verify (4.1) it suffices therefore to prove that 
(4.4) P{|U, +... + Un — n Logn|> en Logn} — 0. 


Put un = E(U;) and c„? = Var(Uz); these quantities depend on n, 
but are common to Uj, Us, ..., Un. Tfr is the largest mteger such that 
2° < n Logn, then un = r and hence for sufficiently large n ` 


(4.5) Logn < un < Logn + Log Logn. 
Similarly 
(4.6) on? < EUR) =2 +2 +... +2 < 2°41 < Qn Logn. 


Since the sum U, +... + Un has mean nu, and variance non’, we 
have by Chebyshev’s inequality 
non? 2 
(4.7) P{|Ui+.-.+ Un — Nnun| > enun} < ETE < E — 0. 


Now by (4.5) un ~ Log n, and hence (4.7) is equivalent to (4.4). 


7 This is a special case of a generalized law of large numbers from which necessary 
and sufficient conditions for (4.1) can easily be derived; cf. W. Feller, Acta Scien- 
tiarum Litterarum Univ. Szeged, vol. 8 (1937), pp. 191-201. 
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5. VARIABLE DISTRIBUTIONS 


Up to now we have considered only the case where the variables X, 
have the same distribution. This situation corresponds to a repetition 
of the same game of chance, but it is more interesting to see what hap- 
pens if the type of game changes at each step. It is not necessary to 
think of gambling places; the statistician who applies statistical tests 
is engaged in a dignified sort of gambling, and in his case the distribu- 
tion of the random variables changes from occasion to occasion. 

To fix ideas we shall imagine that an infinite sequence of probability 
distributions is given so that for each n we have n mutually independent 
variables X;, ..., Xa with the prescribed distributions. We shall 
assume that the means and variances exist and put 


(5.1) we = E(X), op = Var(X,). 

The sum S, = X, +...+ X, has also finite mean and variance 
(5.2) Mn =E(S,), Sa? = Var(Sp) 

given by 

(5.3) Mn = My +e. Un, Sn? = oy? +... ton? 


(ef. formulas TX(2.4) and TX(5.6)]. In the special case of identical 
distributions we had m, = ny, sn? = no?. 


The (weak) law of large numbers is said to hold for the sequence {X,} 
if for every e > 0 


(5.4) P paml > ef = 0. 


The sequence {X;} is said to obey the central limit theorem if for every 
fizeda <p 


S, 7 Min 
(5.5) P fa <=. s) ~ (8) — (a). 


It is one of the salient features of probability theory that both the 
law of large numbers and the central limit theorem hold for a sur- 
prisingly large class of sequences {Xx}. In particular, the law of large 
numbers holds whenever the Xj, are uniformly bounded, that is, whenever 
there exists a constant A such that |X;,| < A for all k. More generally, 
a sufficient condition for the law of large numbers to hold is that 

Sn 
(5.6) => 0. 
n 


T 
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This is a direct consequence of the Chebyshev inequality, and the 
proof given in the opening passage of section 2 applies. Note, however, 
that the condition (5.6) is not necessary (cf. problem 14). 

Various sufficient conditions for the central limit theorem have been 
discovered, but all were superseded by the Lindeberg 8 theorem according 
to which the central limit theorem holds whenever for every e> 0 the 
truncated variables U, defined by 


Ur = Xi — oe if [Xi — m| < esp, 
(6.7) 
U, =0 if |X. — ml> Sp, 


satisfy the conditions Sn —> © and 


(5.8) le YEUH — 1. 
Sn k=l 

If the X; are uniformly bounded, that is, if | X,|< A, then U, = 
= X, — ux for all n which are so large that sn > 24e—'. The left side 
in (5.8) then equals 1. Therefore the Lindeberg theorem implies that 
every uniformly bounded sequence {X} of mutually independent random 
variables obeys the central limit theorem, provided, of course, that 
Sn — ©. It was found that the Lindeberg conditions are also neces- 
sary for (5.5) to hold.” The proof is deferred to the second volume, 
where we shall also give estimates for the difference between the two 
sides in (5.5). 

In the case where the variables X; have a common distribution we 
found the central limit theorem to be stronger than the law of large 
numbers. This is not so in general, and we shall see that the central 
limit theorem may apply to sequences which do not obey the law of 
large numbers. 

Examples. (a) Let à > 0 be fixed, and let X} = +h, each with 
probability 4 (e.g., a coin is tossed, and at the kth throw the stakes 
are +k). Here m = 0, op? = k™, and 

nat 


2+ 1 


(5.9) s = 124 A434 4 NW 


8 J, W. Lindeberg, loc. cit. (footnote 3). 
9 W, Feller, Uber den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung, 


Mathematische Zeitschrift, vol. 40 (1935), pp. 521-559. There also a generalized 
central limit theorem is derived which may apply to variables without expectations, 
Note that we are here considering only independent variables; for dependent vari- 
ables the Lindeberg condition is neither necessary nor sufficient. 
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The condition (5.6) is satisfied if A < 4. Therefore the law of large 
numbers holds if à < 4; we proceed to show that it does not hold if 
a> ¥. 

For k=1, 2, ..., n we have |X,|=i <7, so that for 
n > (2\+ 1)e~ the truncated variables Uz are identical with the Xz. 
Hence the Lindeberg condition applies for \ > 0, and 


(5.10) P {a < (Cas < s} — (8) — Ga). 


It follows that Sp is likely to be of the order of magnitude n*, so 
that the law of large numbers cannot apply for à > $. We see that 
in this example the central limit theorem applies for all ` > 0, but the 
law of large numbers only if \ < 4. 

(b) Consider two independent sequences of 1000 tossings of a coin 
(or emptying two bags of 1000 coins each), and let us examine the 
difference D of the number of heads. Let the tossings of the two se- 
quences be numbered from 1 to 1000 and from 1001 to 2000, respec- 
tively and define 2000 random variables X; as follows: If the kth 
coin falls tails, then X = 0. If it falls heads, we put X, = 1 for 
k < 1000 and X; = —1, for k > 1000. Then D = X; + X: +...+ 
+ X20. Moreover, p = +4, depending on the sequence to which 
the coin belongs, ø? = 4, M2000 = 0, 82000? = 500. Therefore the prob- 
ability that the difference D will lie within the limits -+(500)!a is 
(a) — &(—a), approximately, and D is comparable to the deviation 
S2000 — 1000 of the number of heads in 2000 tossings from its expected 
number 1000, 

(c) An application to the theory of inheritance will illustrate the great 
variety of conclusions based on the central limit theorem. In chapter 
V, section 5, we have studied traits which depend essentially only on 
one pair of genes (alleles). We conceive of other characters (like height) 
as the cumulative effect of many pairs of genes. For simplicity, sup- 
pose that for each particular pair of genes there exist three genotypes 
AA, Aa, or aa. Let zı, £2, and x3 be the corresponding contributions. 
The genotype of an individual is a random event, and the contribution 
of a particular pair of genes to the height is a random variable X, 
assuming the three values Tı, T2, z3 with certain probabilities. The 
height is the cumulative effect of many such random variables X;, X2 
h Xan, and since the contribution of each is small, we may in first 
approximation assume that the height is the sum X, +...+ Xn. It 
y es 4 H all the X, are mutually independent. However, the 

mit theorem holds also for large classes of dependent variables, 


and, besides, it is plausible that the great majority of the X;, can be 
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treated as independent. These considerations can be rendered more 
precise; here they serve only as indication of how the central limit 
theorem explains why many biometric characters, like height, exhibit 
an empirical distribution close to the normal distribution. This theory 
permits also the prediction of properties of inheritance, e.g., the de- 
pendence of the mean height of children on the height of their parents. 
Such biometric investigations were initiated by F. Galton and Karl 
Pearson. 


*6. APPLICATIONS TO COMBINATORIAL ANALYSIS 


We shall give two examples of applications of the central limit 
theorem to problems not directly connected with probability theory. 
Both relate to the n! permutations of the n elements Qj, Qo, ..-, An, to 
each of which we attribute probability 1/n!. 


(a) Inversions 


In a given permutation the element a; is said to induce r inversions 
if it precedes exactly r elements with smaller index (ie., elements which 
precede a; in the natural order). For example, in (a3aga,a5a9a4) the 
elements a; and az induce no inversion, a3 induces two, a4 none, a5 
two, and ag four. In (agasaya3a2q;) the element aņ induces k — 1 inver- 
sions and there are fifteen inversions in all. The number Xx of inver- 
sions induced by ax is a random variable, and S, = X, +...+ X, is 
the total number of inversions. Here X, assumes the values OF alee 
k—1, each with probability 1/k, and therefore 

k-1 


Pes 


2 


r TYP Fa eae ae 
A; k e E 


(6.1) 


Ok 


The number of inversions produced by a; does not depend on the rela- 
tive order of a1, a2, ..., ak—ı, and the X; are mutually independent. 
From (6.1) we get 


1+2+..-+.4¢—1)_ n=1) 2? 


(6.2) g 2 4 4 
and 

12 2n3 + 3n? — 5n në 

aes k- 1)= ~—: 

(6.3) Sn p2 ( ) = FA 


* This section treats a special topic and may be omitted. 
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For large n we have esn > n > Ux, and hence the variables U; of the 
Lindeberg condition are identical with X}. Therefore the central limit 
theorem applies, and we conclude that the number Nn of permutations 


nm «a 
for which the number of inversions lies between the limits — + a nè is, 


asymptotically, given by n!{®(az) — 6(—a)}. In particular, for about 
one-half of all permutations the number of inversions lies between the 
limits (n?/4) = (0.11)n!. 


(b) Cycles 


Every permutation can be broken down into cycles, that is, groups 
of elements permuted among themselves. Thus in (agaga1a5a904) we 
find that a; and az are interchanged, and that the remaining four ele- 
ments are permuted among themselves; this permutation contains two 
cycles. If an element is in its natural place, it forms a eycle so that 
the identical permutation (a1, a2, ...,@,) contains as many cycles as 
elements. On the other hand, the cyclical permutations (a2, a3, o., Any 
1), (a3, a4, ..-, An, A1, a2) ete. contain a single cycle each. For the 
study of cycles it is convenient to describe the permutation by means 
of arrows indicating the places occupied by the several elements. For 
example, 1—3—4—1 indicates that a; is at the third place, ag at the 
fourth, and a, at the first, the third step thus completing the cycle. 
This description continues with ag, which is the next element in the natu- 
ral order. In this notation the permutation (a4, ag, a1, a3, a2, A5, A7, Ag) 
is described by: 1-341; 256-82 7—7. 

Let X; equal 1 if a cycle is completed at the kth step in this build-up; 
otherwise let X, = 0. (In the last example X; = X; = Xs = 1 and 
X, = X, = X, = X; = X; = 0.) Clearly X; = 1 if, and only if, a; is 
at the first place. At the step number 1, 2, .. ., n we have n,n—1,...,0 
choices, respectively, and among them just one leads to the completion 
of a cycle. Therefore © X, = 1 with probability 1/(n — k + 1) and 
X, = 0 with probability (n — k)/(n — k + 1). The variables X, are 
mutually independent with means and variances 


1 s n—k 


6.4 = ' o = 
(6.4) PEG © GS ee 1 
whence 
1 1 1 
(6.5) m =1+-+-+...+-~logn 
2° 3 n 


19 Formally, the distribution of X; depends not only on k but also onn. It suffices 
to reorder the X;, starting from k = n down to k = 1, to have the distribution 
depend only on the subscript. 
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and 2 k 
nr= 

(6.6) 8,7 = 
Py (a—k+1)? 
n = X, +...-+ X, is the total number of cycles. The average is Mn}; 
the number of permutations with cycles between logn + a(log n)? and 
log n + (log n)? is given by n!{ (8) — B(a)}, approximately. The re- 
fined forms of the central limit theorem give more precise estimates, 


*7. THE STRONG LAW OF LARGE NUMBERS 


The (weak) law of large numbers (5.4) asserts that for every par- 
ticular sufficiently large n the deviation |S, — m,| is likely to be small 
in comparison to n. It has been pointed out in connection with Ber- 
noulli trials (chapter VIII) that this does not imply that |S, — M,|/n 
remains small for all large n; it can happen that the law of large num- 
bers applies but that |S, — m,|/n continues to fluctuate between 
finite or infinite limits. The law of large numbers permits only the 
conclusion that large values of |S, — m,|/n occur at infrequent 
moments. 


~ logan. 


We say that the sequence X; obeys the strong law of large numbers if to 
every pair e > 0, 8 > 0, there corresponds an N such that there is prob- 
ability 1 — 6 or better that for every r > 0 all r + 1 inequalities 


S= 
(7.1) tel og n=N,N+1,...,N+r 


n 
will be: satisfied. 


We can interpret (7.1) roughly by saying that with an overwhelming 
probability |S, — mn |/n remains small ® for all n > N. 


The Kolmogorov Criterion. The convergence of the series 


2 
Ok 
(7.2) e 


“A great variety of asymptotic estimates in combinatorial analysis were de- 
tived by other methods by V. Gonéaroy, Du domaine d’analyse combinatoire, 
Bulletin de l'Académie Sciences URSS, Sér. Math. (in Russian, French summary), 
vol. 8 (1944), pp. 3—48. The present method is simpler but more restricted in scope; 
cf. W. Feller, The fundamental limit theorems in probability, Bulletin of the Ameri- 
can Mathematical Society, vol. 51 (1945), pp. 800-832. 

* This section treats a special topic and may be omitted. 

“The general theory introduces a sample space corresponding to the infinite 
Sequence {X;}. The strong law then states that with probability one [Sn — mal/n 
tends to zero. In real variable terminology the strong law asserts convergence 
almost everywhere, and the weak law is equivalent to convergence in measure. 
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is a sufficient condition for the strong law of large numbers to apply to the 
sequence of mutually independent random variables Xy with variances op. 


Proof. Let A, be the event that for at least one n with 2’! < n < 2’ 
the inequality (7.1) does not hold. Obviously it suffices to prove that 
for all v sufficiently large (v > log N) and all 7 


P{A,} + P{Aygi} +--+ PfArte} < 4, 


that is, that the series 2P{A,} converges. Now the event A, implies 
that for some n with X! < n < X? 


€ 
(7.3) ISn = min] 2 5° 2? 
and by Kolmogorov’s inequality (chapter IX, section 7) 
(7.4) P{A,} < 4e7?-s3-2-%, 
Hence 


(7.5) EPIA) <4? D2 So? = 4? Dit an < 
»=l 


væl k=l k=l 2 >k 


œ 2 
<3 
k=l 
which accomplishes the proof. 


As a typical application we prove the 


Theorem. If the mutually independent random variables X, have a 
common distribution { f(x;)} and if » = E(X,) exists, then the strong law 
of large numbers applies to the sequence { X;}. 


This theorem is, of course, stronger than the weak law of section 1. 
The two theorems are treated independently because of the method- 
ological interest of the proofs. For a converse cf. problem 17. 

Proof. We again use the method of truncation. - Two new sequences 
of random variables are introduced by 

U; = Xi, Vi = 0 if | Xx] < k, 
(7.6) i 
U; = 0, Vi = X; if |X,|> k. 


The U; are mutually independent, and we shall show that they satisfy 
Kolmogorov’s criterion. Clearly 


(7.7) a? < E?) T D ga). 


i | <k 


F 
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Put for abbreviation 
(7.8) a= ©},  |zlfz). 


v1 <7 3 <r 


Then the series Za, converges since E(X;) exists. Moreover, from (7.7), 


(7.9). ou" < a + 2ay + Bag +...+ kar 
and 
2 oe? æ i k 2 2 1 ®© 
(7.10) D = Daa = Le, D> <2L4<-o, 
kal kat k“ yal val kæk vel 
Finally 
(7.11) E(U;) = u = Z afe) 


so that u, — p and hence (m + we +...+ un)/n > p. Applying 
the strong law of large numbers to {U;}, we conclude that with prob- 
ability 1 — ô or better” 

n— > Uk — u 

k=l 

for all n > N. It suffices now to prove that the V,, can be neglected, 
that is, that the probability of one or more V, with n > N being 
different from zero tends to 0 with N — œ. The first Borel-Cantelli 


lemma (chapter VIII, section 3) applies with obvious verbal changes, 
and it suffices to prove that SP{V, = 0} converges. Now 


(7.12) <e 


In+2 On+3 


An41 
7.1 P 0} = ;) < — DAS 
(7.13) P{V, = 0} PROE are es 
and hence 
o œ t41 2 Oy 4.1 » 
(7.14) ZPV =0 <E D = po nat =>) agro 
n=lv=n »=l n=l » 
as asserted. 


8. PROBLEMS FOR SOLUTION 


1. Prove that the law of large numbers applies in example (5.a) also when 
A <0. The central limit theorem holds if\ > —4. 
2. Decide whether the law of large numbers and the central limit theorem 


~ hold for the sequences of mutually independent variables X, with distributions 


defined as follows (k > 1): 
(a) P{X, = +2} = 3; 
© PIX = E) = 2700, P(X, = 0) = 1-2-4 
(0) P{X, = +k} = k>, P{X,= 0} =1- ka, 
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3. Ljapunov’s condition (1901). Suppose that for some fixed 6 > 0 we have 
E(|Xz|?+*) = x where Ax/s;? < const. Show that Lindeberg’s conditions 
satisfied if sn — ©. i 

4. Find sufficient conditions on {Z,} for the weak law of large numbers 
and/or the central limit theorem to hold for the mutually independent vari- 
ables {X,}, where {X,} assumes the values 


1 2 k 
ele aie fe ey, 
% toi toi > HE 


each with probability 1/(2k + 1). 
5. Do the same problem if X, assumes the values a+, —a,, and 0 with prob- 
abilities px, pe and 1 — 2p,. 


Note: The following seven problems treat the weak law of large numbers for dependent 
variables. 

6. In problem V, 13 let X} = 1 if the kth throw results in red, and X, = 0 
otherwise. Show that the law of large numbers does not apply. 

7. Let the {X,} be mutually independent and have a common distribution 
with mean y and finite variance. If S, = X, +...+ Xn, prove that the law 
of large numbers does not hold for the sequence {S,} but holds for anSp if 
na, — 0. 

8f Let {X;| be a sequence of random variables such that X, may depend on 
X,- and X;,,; but is independent of all other X;. Show that the law of large 
numbers holds, provided the X, have bounded variances. 

GAT the joint distribution of (1, ..., K,) is defined for every n so that the 
variances are bounded and all covariances are negative, the law of large numbers 
applies. 

10. Continuation. Replace the condition Cov(X;, X,) < 0 by the assump- 
tion that Cov(X;, X.) — 0 uniformly as |j — k| — ©. Prove that the law 
of large numbers holds. 


11. If |S,| < cn and Var(S,) > an?, then the law of large numbers does not 
apply to {X;}. 

12. In the Polya urn scheme [example V(2.c)] let X, equal 1 or 0 according 
to whether the kth ball drawn is black or red. Then S, is the number of 


black balls in n drawings. Prove that the law of large numbers does not apply 
to {Xx}. (Hint: Use problems 11 and IX, 30.) 


13. The mutually independent random variables X, assume the values 
r = 2,3, 4, ... with probability p, = c/(r? log r) where c is a constant such that 
2p, =1. Show that the generalized law of large numbers (4.1) holds if we 
put en = c-n log log n. 

14. Let {Xn} be a sequence of mutually independent random variables 
such that X, = +1 with probability (1 ~ 2-")/2 and X, = +2” with prob- 
ability 2-"~'. Prove that both the weak and the strong law of large numbers 
apply to {X+}. (Note: This shows that the condition (5.6) is not necessary.) 


15. Example of an unfavorable “fair” game. Let the possible values of the 
gain at each trial be 0, 2, 2?, 2?, ...; the probability of the gain being 2 is 


C a e a 


J J 
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(8.1) = l 
; Pe Fe)’ 


and the probability of 0 is po = 1 — (pı + p2+...). The expected gain is 
(8.2) u= = (1-H +4G-D4+G-D)+...=1. 

Assume that at each trial the player pays a unit amount as entrance fee, so 
that efter n trials his net gain (or loss) isS, — n. Show that for every e > 0 
the probability approaches unity that in n trials the player will have sustained a 
loss greater than (1 — ¢)n/Loge n, where Logn denotes the logarithm to the 
base 2. In symbols, prove that 


ahs 2n) 
(8.3) P fs, aes g 
Hint: Use the truncation method of section 4, but replace the bound n Log n 
of (4.2) by n/Logen. Show that the probability that U} = X, for all k < n 
tends to 1 and prove that 


en 
(8.4) P {|Ui+...+ Un — EU) | S Enh 
l EN 
nme STe 4 
(83) ! Logan 2EU)2 1 Logan 


For details see the paper cited in footnote 5. 

16. Let {X,} be a sequence of mutually independent random variables with 
a common distribution. Suppose that the X, do not have a finite expectation 
and let A be a positive constant. The probability is one that infinitely many 
among the events |X,| > An occur. 

17. Converse to the strong law of large numbers. Under the assumption of prob- 
lem 16 there is probability one that |S,| > An for infinitely many n. 

18. A converse to Kolmogorov’s criterion. If 2o2/k? diverges, then there exists 
a sequence { X;} of mutually independent random variables with Var{ X} = ox? 
for which the strong law of large numbers does not apply. (Hint: Prove first 
that the convergence of 2P{|X,| > en} is a necessary condition for the strong 
law to apply.) 


CHAPTER XI 


Integral Valued Variables. 


Generating Functions 


1. GENERALITIES 


Among discrete random variables those assuming only the integral 
values k = 0, 1, 2, ... are of special importance. Their study is 
facilitated by the powerful method of generating functions which will 
later be recognized as a special case of the method of characteristic 
functions on which the theory of probability depends to a large extent. 
More generally, the subject of generating functions belongs to the 
domain of operational methods which are widely used in the theory of 
differential and integral equations. In the theory of probability gener- 
ating functions have been used since DeMoivre and Laplace, but the 
power and the possibilities of the method are rarely fully utilized. 


Definition. Let ao, a1, a2, ... be a sequence of real numbers. If 
(1.1) A(s) = ao + ais + aps? +... 


converges in-some interval —so < s < 80, then A(s) is called the generating 
function-of the sequence {a;}. 


The variable s itself has no significance. If the sequence {a;} is 


bounded, then a comparison with the geometric series shows that (1.1) 
converges at least for |s| < 1. 


Examples. If a; = 1 for all j, then A(s) = 1/(1 — s). The gener- 
ating function of the sequence (0,0, 1, 1, 1, ...) is 2/1 — s). The 
sequence a; = 1/j! has the generating function e’. For fixed n the 


sequence a; = 3 has the generating function (1 + s)”. If X is the 
J 


number scored in a throw of a perfect die, the probability distribution 
of X has the generating function (s + 8? + s3 + st + s + s°)/6. 
248 
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Let X be a random variable assuming the values 0, 1, 2. .... It 
will be convenient to have a notation both for the distribution of X 
and for its tails, and we shall write 


(1.2) P{X=j}=7, P{X>j} =g. 

Then 

(1.3) qk = Pky t+ Pky F.o k> 0. 
The generating functions of the sequences {p;} and {gx} are 

(1.4) P(s) = po + pis + pas” + p? +... 

(1.5) Qs) = go + qs + o? + g? +... 


As P(1) = 1, the series for P(s) converges absolutely at least for 
—1<s<1. The coefficients of Q(s) are less than unity, and so the 
series for Q(s) converges at least in the open interval —1 < s < 1. 
Theorém l. For —1 < s < 1 we have 
1 — P(s) 
L— 3 


(1.6) Q(s) = 


Proof. The coefficient of s” in (1 — s)-@(s) equals gn — gn—1 = 
when n > 1, and equals go = pı + po +... = 1 — po when n 
Therefore (1 — s)-Q(s) = 1 — P(s) as asserted. 


Pn 
= 0. 


Next we examine the derivative 
wo 
(1.7) P'(8) = D> kp,s*—. 
k=l 


The series converges at least for —1 < s < 1. For s = 1 the right 
side reduces formally to Zkpp = E(X). Whenever this expectation 
exists, the derivative P’(s) will be continuous in the closed interval 
~1<s <1. If Dkp; diverges, then P’(s) > ass — 1. In this case 
we say that X has an infinite expectation and write P’(1) = E(X) = œ. 
(All quantities being positive, there is no danger in the use of the sym- 
bol œ.) Applying the mean value theorem to the right side in (1.6), 
we see that Q(s) = P’(c) where ø is a point lying between sand 1. The 
function Q(s) increases monotonically as s — 1, and so Q(s) —> E(X) 
(finite or infinite). This proves i 


Theorem 2. For E(X) we have ihe two expressions 


(1.8) E(X) = Do ip; = 2 Qr- 


j=1 
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In terms of the generating functions 
(1.9) E(X) = P’(1) = Q(1). 


By differentiation of (1.7) and of the relation P"(s) = Q(s) — 
— (1 — s)Q’(s) we find in the same way 


(10) E(K(K — 1) = Zk(k — 1)p: = P”(1) = 20°01). 


To obtain the variance of X we have to add E(X) — E?(X) which 
leads us to 


Theorem 3. We have 
(1.11) Var(X) = P”(1) + P'(1) — PX) = 
= 2Q'(1) + Q(1) — &?(1). 
In the case of an infinite variance P"'(s) + œas s — 1. 


Frequently the formulas (1.9) and (1.11) provide the simplest means 
to calculate E(X) and Var(X). 


2, CONVOLUTIONS 


Let X and Y be non-negative independent integral-valued random 
variables with probability distributions P{X=j} = a; and 
P{Y = j} = bj. The event (X =j, Y= k) has probability ajb. 
The sum S ='X + Y is a new random variable, and the event S = y 
is the union of the mutually exclusive events 


(X=0, ¥=r), (X=1, Y=7—1), (X=2, Y=r_9), |. =r, Y=0). 
Therefore the distribution c, = P{S = r} is given by 
(2.1) c, = aob, + asbri + abro +... + arbi + abo. 


The operation (2.1), leading from the two sequences {a,} and lbr} 
to a new sequence fcz}, occurs so frequently that it is convenient to 
introduce a special name and notation for it. 


Definition. Let {a,} and fbr} be any two number sequences (not 
necessarily probability distributions). The new sequence {c,} defined by 
(2.1) is called the convolution ' of {a;} and {bz} and will be denoted by 


(2.2) {ex} = far}= {bp}. 


‘Some writera prefer the German word Jallung. The French equivalent is com- 


» = 
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Examples. (a) If a, = by = 1 for all k > 0, then c = k +1. If 
ar = k, be = 1, then c = 1 +2 +.:.+ k= k(k + 1)/2. Finally, if 
ao = a, = 4, a, = 0 for k > 2, then cy = (br + br—1)/2, ete. 


The sequences {ap} and {bx} have generating functions A(s) = Za," 
and B(s) = =bys*. The product A(s)B(s) can be obtained by termwise 
multiplication of the power series for A(s) and B(s). Collecting terms 
with equal powers of s, we find that the coefficient c, of s” in the expan- 
sion of A(s)B(s) is given by (2.1). We have thus the 


Theorem. If {ap} and {bx} are sequences with generating functions 
A(s) and B(s), and {cx} is their convolution, then the generating function 
C(s) = Zexs* is the product ; 


(2.3) - C(s) = A(s)B(s). 


If X and Y are non-negative integral-valued mutually independent random 
variables with generating functions A(s) and B(s), then their sum X + Y 
has the generating function A(s)B(s): 


Let now {az}, {br}, {cx}, {de}, ... be any sequences. We can form 
the convolution {a,}*{b.}, and then the convolution of this new se- 
quence with {cx}, etc. The generating function of {a.}*{bi.}*{cx}* {dx} 
is A(s)B(s)C(s)D(s), and this fact shows that the order in which the con- 
volutions are performed is immaterial. For example, {a;}*{bz}* {cx} = 
= {cp} * {br} * {az}, etc. Thus the convolution is an associative and com- 
mutative operation (exactly as the summation of random variables). 

In the study of sums of independent random variables X, the special 
case where the X, have a common distribution is of particular interest. 
If {aj} is the common probability*distribution of the Xn, then the distribu- 
tion of Sn = X, +...-+ Xn will be denoted by {a;}"*. Thus 


(2.4) {a;}2* = {a;}*{a;},  {aj}8* = {a;}?**{a;}, ... 
and generally 
(2.5) {aj}"* = {a;}Y *«{a,}. 


In words, {a;}"* is the sequence of numbers whose generating function is 
A”(s). In particular, {a;}'* is the same as {a;}, and {a;}°* is defined 
as the sequence whose generating function is A°(s) = 1, that is, the se- 
quence (1,0, 0,0, ...). 
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Examples. (b) Binomial distribution. The generating function of 
n 
the binomial distribution b(k; n, p) = C) pq is 


n 


(2.6) 2 O (ps) = (q + ps)” 


The fact that this generating function is the nth power of g + ps shows 
that {b(k; n, p)} is the distribution of a sum Sa = X, +... + X, of n 
independent random variables with the common generating function 


(2.7) {lkn p)} = (blk; 1, p) p”, 


The representation Sn = Xi +...+ X, has already been used [e.g., 
in examples IX(3.a) and IX(5.0)]. The preceding argument may be 
reversed to obtain a new derivation of the binomial distribution. The 
multiplicative Property (q + ps)"(q + ps)” = (a + ps)™+" shows also 
that 


(2.8) {b(k; m, p)}*{b(k; n, p)} = {b(k; m+n, p)} 


which is the same as formula VI(10.4). Differentiation of (q + ps)” 
leads also to a simple proof that E(S,) = np and Var(S,) = npg. 

(c) Poisson distribution, The generating function of the distribution 
p(k; N) = e™*/k! is 


w an (xs)* 


(2.9) = et 
k=0 k! 
It follows that 
(2.10) (p(k; d)}*{p(k; u)} = (p(k; A+u)}, 


which is the same as formula VI(10.5). By differentiation we find 
again that both mean and variance of the Poisson distribution equal A 
[cf. example IX(4.c)]. 


(d) Geometric and negative binomial distributions. Let X be a random 
variable with the geometric distribution 


(2.11) P{X = k} = gp, kis 0,4,.2..% 


where p and q are Positive constants with P+q=1. The corresponding 
generating function is 


(2.12) PÈ at=. 
k=0 1 ~ qs 


EI LE OE E A 
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Using the results of section 1 we find easily E(X) = q/p and 
Var(X) = g/p°, in agreement with the findings in example IX(3.c). 

In a sequence of Bernoulli trials the probability that the first success 
occurs after exactly k failures (i.e., at the k+ 1st trial) is ¢*p, and so X 
may be interpreted as the waiting time for the first success. Strictly 
speaking, such an interpretation refers to an infinite sample space, and 
the advantage of the formal definition (2.11) and the terminology of 
random variables is that we need not worry about the structure of the 
original sample space. The same is true of the waiting time for the rih 
success. If X, denotes the number of failures following the (k—1)st 
and preceding the kth success, then S, = X, + X: +...+ X, is the 
total number of failures preceding the rth success (and S, + r is the 
number of trials up to and including the rth success). The notion of 
Bernoulli trials requires that the X, should be mutually independent 
with the same distribution (2.11), and we can define the X, by this 
property. Then S, has the generating function 


Qas) G = J 


and the binomial expansion II(8.7) shows at once that the coefficient 
of s* equals 


(2.14) S(k3 7, p) = (3) p(—-g*, k=0,1,2,.... 


It follows that P{S, = k} = f(k;7, p), in agreement with the formula 
for the number of failures preceding the rth success derived in chapter 
VI, section 8. We can restate this result by saying that the distribu- 
tion { f(t; 7, p)} is the r-fold convolution of the geometric distribution with 
itself, in symbols 


(2.15) {f(s 7, p)} = tp. 


So far we have considered r as an integer. It will be recalled from chap- 
ter VI, section 8, that {f(k; r, p)} defines the negative binomial distri- 
bution also when r > 0 is not an integer. The generating function is 
still defined by (2.18), and we see that for arbitrary r > 0 the mean 
and variance of the negative binomial distribution are 79/; p and rq/p* and 


that 
(2.16) (f(k; ru p)}# (f(k; Ta pP)} = (Slk; rib, p)}. 
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3. APPLICATION TO FIRST PASSAGE AND RECURRENCE 
TIMES IN BERNOULLI TRIALS 


This section is inserted mainly for illustration. The results will be 
derived by different methods (see example XIII(3.b) and problem 
211.7; chapter XIV, section 5, and problems 11 and 15-17). For the 
special case p = 3 the results are contained in chapter III. However, 
the following derivation provides an excellent example for the method 
of generating functions; and, in addition, it is instructive to compare 
the different approaches, 

We consider Bernoulli trials with the probability of success p and 
put X, = 1 if the kth trial results in Success, X} = —1 otherwise. 
Then S, = X, +... + Xan is the accumulated excess of successes over 
failures in n trials, In the more picturesque gambling language S,, is 


called Peter’s net gain in the first n trials. It is convenient to put 
So = 0. 


(a) First Passages 


Suppose that Peter decides to quit at the first moment when he has 
a positive net gain (necessarily of a unit amount). A direct enumera- 
tion of all possibilities reveals that this will happen at trials number 
1, 3, 5,7, ... with Probabilities p, gp?, 29°p*, 5g°pt, ... but a general 
Tule is not discernible, The sum ø of these Probabilities equals the 
probability that Peter’s net gain will ever become positive. Not even this 


"8 gain will ever 
We proceed to calculate ¢ 


and the probabilities A® that it will take exactly n trials until the net 


gain reaches the level x for the first time. 


In more formal language we seek the probabilit; 
Y Xn that S, < 0, 
S O Si <0 G1 'i t 


More generally, we shall say that a 
first passage through the point z > 0 occurs at the nth trial of 


(3.1) Sı <z, S: <z, .. + Sn <a, S, = 2. 


ibe probability of this event wil be denoted by AP, and for brevity we put 
Meee. Tn gambling (3.1) signifies that Peter’s net gain reaches the 
level z > 0 for the first time at the nth trial. The term first passage 


Ne 
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S’ = X-41 + X,42, ..-, which are independent of the first r -trials. 
A first passage through x = 2 at time n occurs if, and only if, S’; < 0, 
e., Sh—r—1 < 0, S,_, = 1, and the probability of this event is Ap—r. 
In other words, the probability that the first passages through z = 1 
and x = 2 occur at trials number r and n > 7 is),An_,. We conclude 
that the first passage through x = 2 at time n has probability 


(3.2) AP) = MAn + Ana He. t Aat 


Remembering that ào = 0, we see that {A@} = {An}* {An} is the con- 
volution of {A,} with itself. Introducing the generating functions 


(3.3) Als) = Do Ans™ AP) = PO AM st 


n=l n=l 
we have A? (s) = A? (s) and, repeating the argument by induction, 
(3.4) A(s) = A*(s). 


It follows that our task has been reduced to finding the probabilities 
An for the first passage through x = 1. If X, = 1 then this first 
passage takes place at the first trial. If X, = —1 the cumulative net 
gains X2, Xs + Xs, ... after the first trial must increase by two units, 
and we conclude that 


(3.8) =p M=, n>. 
This is obviously equivalent to 
(3.6) A(s) = ps + gsd*(s), 


which is a quadratic equation for A(s). Of the two roots one is un- 
bounded near s = 0, and the unique bounded solution of (3.6) is 
1 — {1 — 4pqs?}# ` 


(3.7) A(s) = 248 $ 


We have thus found the generating functions (3.4) of all first passage 
times. The binomial expansion II(8.7) enables us to write down the 


coefficients 


à = (2) (4pq)"(—1)"+ Nom = 0 
(3.8) 2n—1 = 2q \m pa. > 2m = 


but we are not interested in explicit expressions; it is more instructive 
to extract the relevant information directly from the generating func- 
tion. 
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First note that 


1 
(8.9) AQ) = 


and so X(1) = Lif p > q but A(1) = p/g if p < q. We conclude that 
ZA; equals 1 or p/q, whichever is smaller; when q is larger than p (a game 
unfavorable to Peter), the probability that the sums S, remain negative 
forever equals (q — p)/q. 

In the symmetric case p = q = } and Dy = 1; in a prolonged se- 
quence of coin tossings Peter is sure that he will sooner or later realize a 
positive gain. The question is: How long will it take? From X = 0 
we conclude that in coin tossing the number of trials preceding the first 
passage through 1 has infinite expectation. If Peter hopes to realize a 
unit gain by participating in a coin-tossing game and quitting at the 
first opportune moment, he should expect that an enormous number 
of trials (and, in consequence, an enormous capital) will be required. 
Needless to say that the infinite expectation of the first-passage time 
is closely connected with the unexpected characteristics of the fluctua- 
tions in coin tossing discussed at great length in chapter III. 


Note. We are now in possession of an explicit formula for An but there remains 
the task to calculate the first passage probabilities A? from (8.3) or (3 4). The 
. Standard analytic procedure for that consists in applying complex variable methods. 
It is therefore interesting to remark that simple applications of the reflection prin- 
ciple enabled us in theorem 2 of chapter III, section 4, to write down an explicit, 
expression for A‘? at least in the symmetric case p =g = 4. (With the notations 
used in chapter III we have f§ = %)_,.) A glance at (3.4) and (3.7) reveals the 
pleasing feature that for arbitrary p the probability x equals the corresponding 
probability in the symmetric case multiplied by (4p9)#"(p/q)*. It is instructive 
to follow this case in detail and realize that a most elementary combinatorial argu- 


ment enabled us to solve a difficult technical problem and that it replaces a formidable 
analytical apparatus. 


(b) Recurrence Times 


We shall say that a first return to zero occurs at the nth trial if Sı #0, 
S70, ...,S,4+ 0, S, = 0 (i.e., if the first equalization of the 
accumulated numbers of successes and failures occurs). Let f, be 
the probability of this event. (Clearly fons; = 0 for all n. The first 
few fon are easily found by direct enumeration: fz = 2pq, fa = 2p7q?, 
So = 4p°0, fe = 10p*g*.) 

Let AÑ? be the probability of a first passage through z = —1 at 
the nth trial; in other words, A? is the quantity obtained from 
AM = M by interchanging p and q. As above we note that a return 
to zero in 7 trials is equivalent to a first passage through either +1 
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or —1 in the n — 1 trials following the first trial, and we conclude 


(3.10) fa = Daa + PAL. s 


Multiply by s” and add. Observing that the generating functions of 
{An} and { An} are obtained from each other by interchanging p 
and q, we get 


(3.11) F(s) = 2f,s" = 1 — (1 — 4pqs?)}. 


We conclude: The probability Zf, that the accumulated numbers of suc- 
cesses and failures will ever equalize is 1 — |p — q|. 

In the special case p = q = $ we find that Sf, = 1 but the prob- 
ability distribution {fa} has infinite expectation. The probabilities fon 
were calculated, by entirely different methods, in chapter III, section 4. 
It is illuminating to note that several theorems of chapter III can be 
obtained without calculation and without explicit expressions for fo, 
directly from the generating function F(s). (See problems 6-10.) 


Note. Conceptually, the problem of this section is analogous to the waiting 
time problem of example (2.d). In the sample space of infinite sequences of Bernoulli 
trials we may consider the random variable N, defined as the number of trials from 
the first passage through r — 1 up to and including the first passage through r. 
The {N,} are mutually independent variables with the common generating function 
A(s). The sum N@) = N; +...+ Nz is the waiting time for the first passage 
through x and has the generating function *(s). We have formally avoided re- 
ferring to infinite sample spaces by defining the random variables in terms of their 
distributions. From an analytic point of view the theory is rigorous and self- 
contained, but for the probabilistic interpretation and for the intuition it is pref- 
erable to keep the natural infinite sample space in mind. 


4, PARTIAL FRACTION EXPANSIONS 


Given a generating function P(s) = Eps” the coefficients Pr can be 
found by differentiations from the obvious formula Pk = yet )(0)/k!. 
In practice it may be impossible to obtain explicit expressions and, 
anyhow, such expressions are frequently so complicated that reason- 
able approximations are preferable. The most common method for 
obtaining such approximations is based on partial fraction expansions. 
It is known from the theory of complex variables that a large class of 
functions admits of such expansions, but we shall limit our exposition 
to the simple case of rational functions. — 

Suppose then that the generating function is of the form 


(4.1) P(s) = To 
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where U(s) and V(s) are polynomials without common roots. For 

- simplicity let us first assume that the degree of U(s) is lower than the 
degree of V(s), say m. Moreover, suppose that the equation V(s) = 0 
has m distinct (real or imaginary) roots $1, S2, ..-, 8m. Then 


(4.2) V(s) = (s — 81)(s — 82) --- (8 — Sm), 


and it is known from algebra that P(s) can be decomposed into partial 
fractions 


a +... 


Ss —S -s Gea 


(4.3) P(s) = —— z = 


where pı, p2, ---, Pm are constants. To find pı multiply (4.3) by 
Sı — 8; as s — sı the product (sı — s)P(s) tends to pı. On the other 
hand, from (4.1) and (4.2) we get 


—U(s) 


Bee 2 Rer baer 


As s — sı the numerator tends to —U (sı) and the denominator to 
(81 — 82)(s1 — 83)... (sı — 8m), which is the same as V'(s1). Thus 


pı = —U(s,)/V'(s,). The same’ argument applies to all roots, so that 
fork <m 
— U(sz) 
4.5) = a 
( ee e 


Unfortunately, extensive numerical calculation is usually required 
to put (4.1) into the form (4.3). However, once the expansion (4.3) 
is obtained, we can easily derive an exact expression for the coefficient 
of s* in P(s). Write 
(4.6) see 

S—S Sk 1—s/xy 


For |s| <|s,| we expand the last fraction into a geometric series 


1 s SAA BAS 
(4.7) ea A O E 
1 — 8/8, Sk Sk Sk 
Introducing these expressions into (4.3), we find for the coefficient Pn 
of s” 


p p 
(4.8) Pa i Fig PLF Pn. 
1 2 


Sm 
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Thus, to get pan we have first to find the roots Sty. +, Sm of the 
denominator and then to determine the coefficients Ai +++) Pm from 


(4.5). 

In (4.8) we have an exact expression for the probability pa. The 
labor involved in calculating all m roots is usually prohibitive, and 
therefore formula (4.8) is primarily of theoretical interest. Fortunately 
a single term in (4.8) almost always provides a satisfactory approxima- 
tion. In fact, suppose that sı is a root which is smaller in absolute 
value than all other roots. Then the first denominator in (4.8) is 
smallest. Clearly, as n increases, the proportionate contributions of 
the other terms decreasé and the first term preponderates. In other 
words, if sı is a root of V(s) = 0 which is smaller in absolute value than 
all other roots, then, asn — ©, 


pı 
(4.9) Pa ~ gH 


(the sign ~ indicating that the ratio of the two sides tends to l» 
Usually this formula provides surprisingly good approximations even 
for relatively small values of n. The main advantage of (4.9) lies in 
the fact that it requires the computation of only one root of an algebraic 
equation. 

It is easy to remove the restrictions under which we have derived 
the asymptotic formula (4.9). To begin with, the degree of the numer- 
ator in (4.1) may exceed the degree m of the denominator. Let U F(s) 
be of degree m + r (r > 0); a division reduces P(s) to a polynomial of 
degree r plus a fraction U;(s)/V(s) in which U, (s) is a polynomial of a 
degree lower than m. The polynomial affects only the first r + 1 terms 
of the distribution {pn}, and U;(s)/V(s) can be expanded into partial 
fractions as explained above. Thus (4.9) remains true. Secondly, the 
restriction that V(s) should have only simple roots is unnecessary. It 
is known from algebra that every rational function admits of an expan- 
sion into partial fractions. If s+ is a double root of V(s), then the par- 
tial fraction expansion (4.3) will contain an additional term of the 
form a/(s — s,)*, and this will contribute a term of the form 
a(n + 1)s; +? to the exact expression (4.8) for pa. However, this 
does not affect the asymptotic expansion (4.9), provided only that sı 
is a simple root. We note this result for future reference as a 


Theorem. If P(s) is a rational function with a simple root sı of the 
denominator which is smaller in absolute value than all other roots, then 
the coefficient pn of s* is given asymptotically by pa ~ ps; *, where 
pı îs defined in (4.5). 
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A similar asymptotic expansion exists also in the case where s; is a 
multiple root. (See problem 25.) 


Examples. (a) Let a, be the probability that n Bernoulli trials | 
result in an even number of successes. This event occurs if an initial l 
failure at the first trial is followed by an even number of successes or 
if an initial success is followed by an odd number. Therefore 


(4.10) an = qanı + p(1 — anı), a = 1. 


Multiplying by s” and adding we get the relation A (s) — 1 = gsA(s) + 
+ ps(1 — s)~* — psA(s) for the generating function A(s). Hence 


(4.11) 2A(s) = {1—s} + {1—(@—p)s}, 2a,=1+(—>p)’. 


Observe that the last formula is in every way preferable to the obvious 
answer an = b(0; n, p) + b(2; n, p) +.... 

(b) Let gn be the probability that in n tosses of an ideal coin no run 
of three consecutive heads appears. (Note that {gn} isnot a probability 
distribution; if p, is the probability that the first run of three consecu- 
tive heads ends at the nth trial, then {pn} is a probability distribution, 
and qn represents its “tails,” gn = Pai + Pn +....) 

We can easily show that qn satisfies the recurrence formula 


(4.12) Qn = banı + 3qn—2 + FQqn—s- 


In fact, the event that n trials produce no sequence HHH can occur 
only when the trials begin with T, HT, or HHT. The probabilities 
that the following trials lead to no run HHH are Qn—1; In—2, aNd gn—sz, 
respectively, and the right side of (4.12) therefore contains the prob- 
abilities of the three mutually exclusive ways in which the event ‘no 
run HHH” can occur. 

Evidently go = qı = q2 = 1, and hence the qn can be calculated 
successively from (4.12). To obtain the generating function Q(s) = 
= 2qn8" we multiply both sides by s” and add. We get 


2 è 
Q@) —1-s—# == (9) —1- 9 +5 RO- 1) +a) 


4 
or 
2s? + 4s + 8 
4.13) Ae ER 
: a9) 8 — 4s — 2s? — 53 


The denominator has the root s; = 1.0873778... and two complex 
roots. For |s| < sı we have |4s + 2s? + $| < 4s; + 23,7 + s1? = 8, 
and the same inequality holds also when [s| = sı unless s = sı. Hence 
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the other two roots exceed sı in absolute value. Thus, from (4.9) 
1.236840 

(1.0873778)"" 


where the numerator equals (2s,” + 4s, + 8)/(4 + 4s; + 3s,”). This 
formula gives remarkably good approximations even for small values 
of n. It approximates q3 = 0.875 by 0.8847 and q, = 0.8125 by 
0.81360. The percentage error decreases steadily, and qi. = 0.41626... 
is given correct to five decimal places. 


5. BIVARIATE GENERATING FUNCTIONS 


For a pair of integral-valued random variables X, Y with a joint 
distribution of the form 


(4.14) Gn ~ 


(5.1) P{X = j, Y = k} = py j} k= 0,1; 
we define a generating function depending on two variables 
(5.2) P(s1, 82) = D> pjrsts2. 

ik 


Such generating functions will be called bivariate for short. 

The considerations of the first two sections apply without essential 
modifications, and it will suffice to point out three properties evident 
from (5.2): 

(a) The generating function of the marginal distributions P{ X = j} 
and P{Y = k} are A(s) = P(s, 1) and B(s) = P(1, s). 

(b) The generating function of X + Y is P(s, s). 

(c) The variables X and Y are independent if, and only if, P(s1, 82) = 
= A(s,) B(se) for all sı, se. 


Examples. (a) Bivariate Poisson distribution. It is obvious that 
(5.3) P(si, s2) = e aiar bt aroitasotbnss a; > 0, b>0 


has a power-series expansion with positive coefficients adding up to 
unity. ~ Accordingly P(sı, $2) represents the generating function of a 
bivariate probability distribution. The marginal distributions are 
Poisson distributions with mean a; + b and ag + b, respectively, but 
the sum X + Y has the generating function e81—a2— b+ (ai+a2)s+ba® ond 
is not a Poisson variable. (It is a compound Poisson distribution; see 
chapter XII, section 2.) 

(b) Multinomial distributions. Consider a sequence of n independent 
trials, each of which results in Eo, E,, or Ez with respective probabilities 
Po, Pi, Do. If X; is the number of occurrences of E;, then (Xj, X2) has 
a trinomial distribution with generating function (po + p18; + pose)". 
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*6. THE CONTINUITY THEOREM 


We know from chapter VI that the Poisson distribution {e™™*/k!} 
is the limiting form of the binomial distribution with the probability 
p depending on n in such a way that np —> A as n => œ. Then 
b(k; n, p) + e*/kl. The generating function of {b(k; n, p)} is 
(a + ps)” = {1 — A(1 — 5)/n}". Taking logarithms, we see directly 
that this generating function tends to e——») , which is the generating 
function of the Poisson distribution. We shall show that this situation 
prevails in general; a sequence of probability distributions converges 
to a limiting distribution if and only if the corresponding generating 
functions converge. Unfortunately, this theorem is of limited applica- 
bility, since the most interesting limiting forms of discrete distributions 
are continuous distributions (for example, the normal distribution ap- 
pears as a limiting form of-the binomial distribution). 


Continuity Theorem. Suppose that for every fixed n the sequence 
Qo,n, Aln, Aen, ... 18a probability distribution, that is, 


(6.1) Fn 20 Dain = 1, 
k=0 

In order that for every fixed k 

(6.2) akn —> ak 


asn — œ, it is necessary and suficient that for every swithO0<s<1 


(6.3) An(s) > A(s). 

Here 

(6.4) Aal) = Do arnt, Al) = Sagat 
k=0 k=O 


denote the corresponding generating functions. 


Note. If (6.2) holds, then automatically 0 < a, < 1 and Za, < 1. 
The generating function A(s) exists therefore at least for |s| <1. 


vanish, then the limiting Sequence vanishes identically. For {ax} to 
be a probability distribution it is necessary and sufficient that Za, = 1 
or A(1) = 1. 


* The contents of this section will not be used in the sequel, 
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Proof? First, suppose that (6.2) holds. For fixed s (0 < s < 1) and 
fixed e we can choose r so that s"/(1 — s) < e. Then 


(€.5) |An(s) — A(s)|< E larn — arl + 2e. 
k=0 

The sum on the right contains only finitely many terms, each of which 
tends to zero. Hence |A,(s) — A(s)| is arbitrarily small for n suffi- 
ciently large. Next, assume that (6.3) holds. We use the well-known 
fact? that it is always possible to find a subsequence {ax,,} of the 
given sequence of distributions which converges. If (6.2) were not true, 
then it would be possible to extract two subsequences converging to 
two different limiting sequences {a,*} and {a,**}, and the correspond- 
ing subsequences of {A,(s)} would converge to A*(s) = Da,*s* and 
A**(s) = Da,**s", respectively. However, this is impossible in view 
of the assumption (6.3). Therefore (6.3) implies (6.2). 

Examples. (a) The negative binomial distribution. We saw in exam- 
ple (2.d) that the generating function of the distribution {f(k; r, p)} is 
p’(1 — qs)”. Now let à be fixed, and let p — 1, qg — 0, so that 
q=>/r. Then 


p N (l—Ar\ 
ae 
1 — gs, 1 —As/r, 
Passing to logarithms, we see that the right side tends to e*™, which 


is the generating function of the Poisson distribution {ern*/k}}. 
Hence if r > œ and rg — 2, then 


k 

(6.7) Skin, p) > eT 
(b) Bernoulli trials with variable probabilities. Consider n independ- 
ent trials such that the kth trial results in success with probability pr 
and in failure with probability g = 1 — px. The number S, of suc- 
cesses can be written as the sum Sn = X, +...-+ Xn of n mutually 
independent random variables X+ with the distributions P {X} = 0} = gx, 


2 The theorem is a special case of the continuity theorem for Laplace-Stieltjes 
transforms, and the proof follows the general pattern. In the literature the conti- 
nuity theorem for generating functions is usually stated and proved under unneces- 
sary restrictions. 

3 This is easily established by the “method of diagonals” due to G. Cantor and 
found in all books on set theory. The statement is, incidentally, a special case of 


a well-known theorem of Helly. 
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P{X, = 1} = pą. The generating function of X, is gx + prs, and 
hence the generating function of S, 


(6.8) P(s) = (qı + pıs) (q2 + p28) +++ (Gn + Das). 


As an application of this scheme let us assume that each house in a 
city has a small probability p of burning on a given day. The sum 
Pı +...+ Dn is the expected number of fires in the city, n being the 
number of houses. We have seen in chapter VI that if all py are equal 
and if the houses are stochastically independent, then the number of 
fires is a random variable whose distribution is near the Poisson dis- 
tribution. We show now that this conclusion remains valid also under 
the more realistic assumption that the probabilities Px are not equal. 
This result should increase our confidence in the Poisson distribution 
as an adequate description of phenomena which are the cumulative 
effect of many improbable events (“successes”). Accidents and tele- 
phone calls are typical examples. 

We use the now familiar model of an increasing number n of variables 
where the probabilities p depend on n in such a way that the largest 


px tends to zero, but the sum Pi + po +...+ Pn = À remains con- 
stant. Then from (6.8) 


(6.9) log P(s) = Do log {1 — p,(1 — 8}. 
k=l 
Since pz — 0, we can use the fact that log (1 — z) = —z — 6x, where 


6 — Oasz — 0. It follows that 


(6.10) log P(s) = —(1 — s) [È (pk + om) = =N=; 
=1 


so that P(s) tends to the generating function of the Poisson distribu- 
tion. Hence, S, has in the limit a Poisson distribution. We conclude 
that for large n and moderate values of A= pi +p: +...+ pn the 


distribution of S, can be approximated by a Poisson distribution. [Cf. 
example IX(5.b).] 


7. PROBLEMS FOR SOLUTION 


1. Let X be a random variable with generating function P(s). Find the 
generating functions of X + 1 and 2x. 


2. Continuation. Find the generating functions of (a) P{X < n}, b) 
P(X < n}, (c) P(X > n}, (d) P{X>n +1), (e) P{X = 2n}. 


3. In a sequence of Bernoulli trials let Un be the probability that the first 


combination SF occurs at trials number n — land n. Find the generating 
function, mean, and variance, 


= 
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4. Discuss which of the formulas of chapter II, section 12, represent con- 
volutions and where generating functions have been used. 

5. Let a, be the number of ways in which the score n can be obtained by 
throwing a die any number of times. Show that the generating function of 
{an} is {1 — 8 — 8 — sè — at — $ — af}. 

Note: Problems 6-10 refer to coin tossing with the usual notations. They contain, 
among other things, a straightforward derivation of certain relations found in chapter 
III. We write un = P{Sn = 0} and fy = P{S; * 0, S2 = 0, ..., Sn * 0, 
Sn = 0} (first return); by definition uw = 1, fo = 0. We assume known (from section 
3) that {fn} has the generating function F(s) = 1 — {1 — s*}8, and nothing more. 
The calculations are practically nil, and no explicit formulas for the coefficients are 
required. 


6. The generating function of {un} is U(s) = {1 — s?}—. 

7. The probability that no zero occurs up-to time 2n is the same as the prob- 
ability uzn that Se, = 0. 

8. The probability that Sən = 0 and that all the sums Sı, Sz, ..., Sz, are 
>0 equals 2fen+2- : 

9. The probability that the first change of sign occurs following the 2nth 
trial equals 2fen+2- 

10. The probability that exactly k among the sums Sı, ..., Sa are zero 
has the generating function F*(s) U(s)(1 + 8). 


11. In a sequence of Bernoulli trials with p > q let an be the probability 
that there exists an index j > n such that S; = 0. Show that an has the gen- 
erating function 4pqip — q + (1 — 4ps) (1 + 8). 

12. In the waiting time example IX(3.d) find the generating function of S, 
(for r fixed). Verify formula IX(3.3) for the mean and calculate the variance. 

13. Continuation. The following is an alternative method for deriving the 
same result. Let pa(r) = P{S, = n}. Prove the recursion formula 


aa pasa) = pale) +H at- D. 


Derive the generating function directly from (7.1). 

14. Solve the two preceding problems for r preassigned elements (instead 
of r arbitrary ones). 

15.4 Let the sequence of Bernoulli trials up to the first failure be called a 


4 Problems 15-17 have a direct bearing on the game of billiards. The probability 
p of success is a measure of the pluyer’s skill. The player continues to play until 
he fails. Hence the number of successes he accumulates is the length of his “turn.” 
The game continues until one player has scored N successes. Problem 15 therefore 
ives the probability distribution of the number of turns one player needs to score 
gives the p blem 16 the average duration, and problem 17 the probability of a 
k successes, an layers. For further details cf. O. Bottema and 8. C. Van Veen, 
un het biljartspel, Nieuw Archief voor Wiskunde (in Dutch), vol 
nsber 
22 (1943), pp- 16-33 and 123-158. 
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turn. Find the generating function and the probability distribution of the 
accumulated number S, of successes in r turns. 

16. Continuation. Let R be the number of successive turns up to the 
yth success (that is, the vth success occurs during the Rth turn). Prove that 


PIR =r} = pg (7$? 77). Find ER) and Varm). 


17. Continuation. Consider two sequences of Bernoulli trials with prob- 
abilities pı, qı, and pz, gz, respectively. Show that the probability that the 
same number of turns will lead to the Nth success can be exhibited in either of 
the forms: 

2 (N+yv—2 
=1 s= 


) (oar = 


(pip2)® 
r N=1 =, [x2 
= (rip = pd E (YF *) aust 


18. Let {X.} be mutually independent variables, each assuming the values 
0, 1, 2, ...,a—1 with probabilities 1/2. LetS, = X:+...+X,. Show that 
the generating function of S, is 


_ fl—s)" 
and hence eee ta g 3 
PIS. = j) = $ È ytte (") Gow) 


(Only finitely many terms in the sum are different from zero.) 
Note: For a = 6 we get the probability of scoring the sum j + n in a throw 
with n dice. The solution goes back to DeMoivre. 


19. Continuation. The probability P{S, < j} has the generating function 
P(s)/(1 — 8) and hence 


Woe ek [N (i -— a 
Pissa -2ECr() 0%) 
Kea Continuation: the limiting form. Ifa — andj — o, so that j/a > x, 
then 
7 1 n 
PIS. <j) > GEOD (") e-r, 


the summation extending over ally with 0 < p < z. 

Note: This result is due to Lagrange. In the theory of geometric probabilities 
the right-hand side represents the distribution function of the sum of n in- 
dependent random variables with “uniform” distribution in the interval (0, 1). 

21, Let un be the probability that the number of successes in n Bernoulli 
trials is divisible by 3. Find a recursive relation for ün and hence the generat- 
ing function. 

22. Continuation: alternative method. Let v, and w, be the probabilities that 
S,, is of the form 3v + 1 and 3v + 2, respectively (so that up + va + Wa = 1). 


Find three simultaneous recursive relations and hence three equations for the 
generating functions. 


`, 
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23. Let X and Y be independent variables with generating functions U(s) 
and V(s). Show that P{X — Y = j} is the coefficient of s' in U(s) V(1/s), 
where j = 0, +1, +2, .... 

24. Moment generating functions. Let X be a random variable with generat- 
ing function P(s), and suppose that 2p,s" converges for some sọ > 1. Then all 
moments m, = E(X’) exist, and the generating function F(s) of the sequence 
m,/r! converges at least for |s| < log so. Moreover 

F(s) = E =s = Pee). 
roo 7! 

Note: F(s) is usualiy called the moment generating function, although in real- 
ity it generates m,/r!. 

25. Suppose that A(s) = 2a,s" is a rational function U(s)/V(s) and that 
sı is a root of V(s), which is smaller in absolute value than all other roots. 
If s, is of multiplicity r, show that 
M € +r-1 


Qn ~ 
< RON ri 


where pı = —r!U(s1)/V (sı). 

26. Bivariate negative binomial distributions. Show that for positive values 
of the parameters po%{1 — pı8ı — p282} 7° is the generating function of the 
distribution of a pair (X, Y) such that the marginal distributions of X, Y, 
and X + Y are negative binomial distributions.’ 


6 Distributions of this type were used by G. E. Bates and J. Neyman in investiga- 
tions of accident proneness. See University of California Publications in Statistics, 
vol. 1, 1952. 


CHAPTER X11* 


Compound Distributions. 
Branching Processes 


1. SUMS OF A RANDOM NUMBER OF VARIABLES 


Let {X;} be a sequence of mutually independent random variables 
with the common distribution P{X;, = j } = fj and generating function 
f(s) = Zf:s*. We are often interested in sums Sw = Xi +X +...+ 
+ Xy, where the number N of terms is a random variable independent 
of the X;. Let P{N = n} = g, be the distribution of N and g(s) = 
= 2g,8" its generating function. For the distribution {hj} of Sy we 
get from the fundamental formula for conditional probabilities 


(4) hy = P{Sx =j} = D P(N = n} P(X, +...4X, = j). 
n=0 


If N assumes only finitely many values, the random variable Sy 18 
defined on the sample space of finitely many X}. Otherwise the 
probabilistic definition of Sy as a sum involves the sample space of an 
infinite sequence {X;}, but we shall be dealing only with the distribu- 
tion function of Sy: for our purposes we take the distribution (1.1) as 
definition of the variable Sy on the sample space with points 0, 1, rice 

For a fixed n the distribution of Xi + X: +...+ XK, is given by the 
n-fold convolution of {f;} with itself , and therefore (1.1) can be written 
in the compact form 


(1.2) th} = Do gnif. 
n=0 


This formula can be simplified by the use of generating functions. The 
generating function of {f,}"* is f"(s) and it is obvious from (1.2) that 


* The contents of this chapter will not be used in the sequel. 
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the generating function of the sum Sy is given by 
© © 
(1.8) h(s) = Do ys? = X gaf”(8). 
j=0 n=0 


The right side is the Taylor expansion of g(s) with s replaced by f(s); 
hence it equals g(f(s)). This proves the 


Theorem. The generating function of the sumSy = X, +...+ Xy 
is the compound function g(f(s)). 


Two special cases are of interest. 

(a) If the X; are Bernoulli variables with P{X; = 1} = p and 
P{X; = 0} = q, then f(s) = g + ps and therefore h(s) = g(q + ps). 

(b) If N has a Poisson distribution with mean ¢ then 


(1.4) h(s) = ett¥@, 


The distribution with this generating function will be called the com- 
pound Poisson distribution. 

If the X; are Bernoulli variables and N has a Poisson distribution, 
then A(s) = e~'?*?*; the sum Sy has a Poisson distribution with mean tp. 


Examples. (a) We saw in example VI(7.c) that X-rays produce 
chromosome breakages in cells; for a given dosage and time of exposure 
the number N of breakages in individual cells has a Poisson distribu- 
tion. Each breakage has a fixed probability q of healing whereas with 
probability p = 1 — 4 the cell dies. Here Sy is the “number of ob- 
servable breakages ' and has a Poisson distribution with mean tp. 

(b) In animal-trapping experiments? g, represents the probability 
that a species is of size n. If each animal has a fixed probability p of 
being trapped, then (assuming stochastic independence) the number 
of trapped representatives of one species in the sample is a variable 
Sy with generating function 9(@ + ps). This description can be varied 
in many ways. For example, let g, be the probability of an insect’s 
laying n eggs, and p the probability of survival of an egg. Then Sy is 
the number of surviving eggs. Again, let gn be the probability of a 
family’s having n children and let the sex ratio of boys to girls be 
p:q. Then Sy represents the number of boys in a family. 


Genetic effects of radiations, Advances in Genetics, edited 


1 See D. G. Catcheside, 
New York, 1948, pp. 271-358, in particular 


by M. Demeree, vol. 2, Academic Press, 


. 339. : k 
i 2D. G. Kendall, On some modes of population growth leading to R. A. Fisher's 
logarithmic series distribution, Biometrika, vol. 35 (1948), pp. 6-15. 
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(c) Each plant has a large number of seeds, but each seed has only 
a small probability of survival, and it is therefore reasonable to assume 
that the number of survivors of an individual plant has a Poisson dis- 
tribution. If g, represents the distribution of the number of parent 
plants, ge“) is the generating function of the number of surviving 
seeds 


2. THE COMPOUND POISSON DISTRIBUTION 
We preface our considerations by two typical 


Examples. (a) Suppose that the number of hits by lightning dur- 
ing any time interval of duration t is a Poisson variable with mean 2. 
If {fn} is the probability distribution of the damage caused by an in- 
dividual hit by lightning, then (assuming stochastic independence) the 
probability distribution of the total damage during time tis a compound 


At)” 
Poisson distribution {hj} =e* > 2o {4}"* with generating func- 
tion ý 


(2.1) h(s; t) = etr, 


(b) In ecology it is assumed that the number of animal litters in a 
-plot has a Poisson distribution with mean proportional to the area i 
of the plot. If {f+} is the distribution of the number of animals in a 
litter, then (2.1) is the generating function for the total number of 
animals in the plot. 


We recall from chapter VI that many phenomena, depending on time 
or space obey a Poisson distribution, and the preceding examples will 
explain why the compound Poisson distribution is also frequently con- 
nected with such phenomena. 


The generating function (2.1) has the remarkable property that 
(2.2) h(s; itt) = h(s; t1)h(8; tə). 


means that a partitioning t=t+bofa period into two parts induces 
‘a decomposition X(t) = X(t) + X(te) of its contribution into a sum 


In the next section it will be shown that (among integral-valued ran- 
dom variables) only the compound Poisson distribution has this prop- 
erty. Here we preface the formulation of the theorem by two 
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Examples. (c) The negative binomial distribution with generating 
function 


p t 
(23) h; s) = ( = =) ptq=1 


does have the property (2.2). Therefore the negative binomial (2.3) is a 
compound Poisson distribution; it takes on the form (2.1) with 


1 Ai 
ma 


sae" n 


1 1 
(2.4) A = log =: f(s) = -log 
p A 


The distribution {Ag"/n} is called the logarithmic distribution. 

(d) Multiple Poisson distributions. Suppose that we classify auto- 
mobile accidents according to the number of vehicles involved as 
singlets, doublets, etc. Suppose further that the numbers of singlets, 
doublets, éte., have Poisson distributions with means Dut, Aot, ... and 
that there is no stochastic dependence among them. The total num- 
ber of vehicles involved in accidents during a period ¢ has then the 
generating function 


(2.5) ets) pd t(1—a*) o —Ast (1—1?) 


This is again a compound Poisson distribution with \ = 2A; and 
fi = ;/d. Conversely, every compound Poisson distribution can be 
rewritten in the form (2.5) and therefore admits of the alternative 
interpretation as representing the cumulative effect of singlets, doublets, 
ete. 

3. INFINITELY DIVISIBLE DISTRIBUTIONS 

“A probability distribution {hi}, i = 0, 1, ..., is called infinitely 
divisible, if for each n it can be represented as the n-fold convolution ofa 
probability distribution {$;} with itself, that is, if its generating function 
h(s) has an nth root such that h'!"(s) = o(s) generates a probability dis- 
tribution {¢:}. 

Note that if h(s;t) satisfies (2.2), then h(s;t) = h”(s; t/n) and 
therefore h(s; t) is infinitely divisible for each t. The assertion of the 
preceding section is contained in the following theorem (which is a 
special case of an important general theorem of P. Lévy concerning 
arbitrary probability distributions). 

Theorem. If {hi} is infinitely divisible, then its generating function 
can be written in the form (2.1) (say with t = 1). 

[Note that h'(s) = A(s; t) satisfies (2.2).] 
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Proof. Suppose that h?!” (s) is a probability generating function for 
each n. This is possible only if h(0) = ho > 0. Then h(s) must be 
positive in some interval |s| < a < 1 andin it wehave0 < 1 — h(s) < 
<1. It follows that log h(s) = log (1 — {1 — h(s)}) has the Taylor 
series 
(3.1) log (s) = D xis* —a<s<a. 

i=0 
Putting s = 0, we see that x9 <0. We want to prove that all other 
x; are non-negative. Assume the contrary, and let r > 1 be the smallest 
index such that x, < 0. To avoid clumsy formulas set 
r-1 


© 
1 
(3.2) AQ) = Dine’ Bs) = È we", ee 
»=l r=r+4l n 
so that 
(3.3) HEM (gy) = exo. eA Oet gt BO), 


By assumption h!!”(s) = 2¢,s* where ¢ > 0. Consider in particular 
the coefficient $, of s”. The power series B(s) contains only powers of 
order greater than r and hence does not contribute to ¢,. Therefore 
¢, is the coefficient of s” in 


(3.4) eo(1 + €A(s) + $PA7(8) +...)(1 + exes"). 
Since A(s) is a polynomial of degree <r — 1 we see that 
(3.5) br = elex, + Pp(e)] 


where p(e) is a polynomial in e. If x, <0 as assumed, the right 
side of (3.5) will be negative for e sufficiently small, and thus ¢, < 0 
which is impossible. This proves that x, > 0 forr = 1, 2, .... More- 
over, h(1) = 1 and hence log h(1) = 2% = 0, that is, —xo = xı + 
+x2+.... To write h(s) in the form (2.1) with t = 1 it suffices now 
to put —xo = A and f; = xi/A. 


4. EXAMPLES FOR BRANCHING PROCESSES 


We shall describe a chance process which serves as a simplified 
model of many empirical processes and also illustrates the usefulness 
of generating functions. In words the process may be described as 
follows. 

We consider particles which are able to produce new particles of like 
kind. A single particle forms the original, or zero, generation. Every 
particle has probability px (k = 0, 1, 2, ...) of creating exactly k new 
particles; the direct descendants of the nth generation form the (n-+1)st 
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generation. The particles of each generation act independenily of each 
other. We are interested in the size of the successive generations. 

A few illustrations may precede a rigorous formulation in terms of 
random variables. 

(a) Nuclear chain reactions. This application became familiar in 
connection with the atomic bomb.? The particles are neutrons, which 
are subject to chance hits by other particles. Let p be the probability 

„ that the particle sooner or later scores a hit, thus creating m particles; 
then g = 1 — pis the probability that the particle has no descendants; 
that is, it remains inactive (is removed or absorbed in a different way). 
In this scheme the only possible numbers of descendants are 0 and m, 
and the corresponding probabilities are g and p (i.e., Po = 9, Pm = p, 
Pj = 0 for all other j). At worst, the first particle remains inactive 
and the process never starts. At best, there will be m particles of the 
first generation, m? of the second, and so on. If p is near one, the 
number of particles is likely to increase very rapidly. Mathematically, 
this number may increase indefinitely. Physically speaking, for very 
large numbers of particles the probabilities of fission cannot remain 
constant, and also stochastic independence no longer holds. However, 
for ordinary chain reactions, the mathematical description “indefinitely 
increasing number of particles” may be translated by “explosion.” 

(b) Survival of family names. Here (as often in life), only male 
descendants count; they play the role of particles, and px is the prob- 
ability for a newborn boy to become the progenitor of exactly k boys. 
Our scheme introduces two artificial simplifications. - Fertility is sub- 
ject to secular trends, and therefore the distribution { Pk} in reality 
changes from generation to generation. Moreover, common inheritance 
and common environment are bound to produce similarities among 
brothers which is contrary tg our assumption of stochastic independ- 
ence. Our model can be refined to take care of these objections, but 
the essential features remain unaffected. We shall derive the prob- 
ability of finding & carriers of the family name in the nth generation 
and, in particular, the probability of an extinction of the line. Surviyal 
of family names appears to have been the first chain reaction studied 
by probability methods. The problem was first treated by F. Galton 
(1889) ; for a detailed account the reader is referred to A. Lotka's book. 


*The following description follows E. Schroedinger, Probability problems in 
nuclear chemistry, Proceedings of the Royal Irish Academy, vol. 51, sect. A, No. 1 
(December 1945). There the assumption of spatial homogeneity is removed. 

‘Théorie analytique des associations biologiques, vol. 2, Actualités scientifiques 
ct industrielles, No. 780 (1939), pp. 123-136, Hermann et Cie, Paris. 
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Lotka shows that American experience is reasonably well described by 
the distribution po = 0.4825, p = (0.2126) (0.5893)*—! (k > 1), which, 
except for the first term, is a geometric distribution. 

(c) Genes and mutations. Every gene of a given organism (cf. chap- 
ter V, section 5) has a chance to reappear in 1, 2,3, ... direct descend- 
ants, and our scheme describes the process, neglecting, of course, varia- 
tions within the population and with time. This scheme is of particu- 
lar use in the study of mutations,’ or changes of form in a gene. A, 
spontaneous mutation produces-a single gene of the new kind, which 
plays the role of a zero-generation particle. The theory leads to esti- 
mates of the chances of survival and of the spread of the mutant gene. 
To fix ideas, consider (following R. A. Fisher) a corn plant which is 
father to some 100 seeds and mother to an equal number. If the popu- 
lation size remains constant, an average of two among these 200 seeds 
will develop to a plant. Each seed has probability 4 to receive a par- 
ticular gene. The probability of a mutant gene’s being represented in 
exactly k new plants is therefore comparable to the probability of 
exactly k successes in 200 Bernoulli trials with probability p = x00 
and it appears reasonable to assume that {p+} is, approximately, 2 
Poisson distribution with mean 1. If the gene carries a biological ad- 
vantage, we get a Poisson distribution with mean \ > 1. 

(d) Waiting lines! The theory of branching processes is useful for 
the analysis of fluctuations in waiting lines (in post offices, telephones, 
etc.). A customer arriving at an empty counter and having no waiting 
time is termed ancestor; the customers arriving during the ancestor’s 
service time and joining in the queue are his direct descendants. The 
process continues as long as the queue lasts. In this example we are 
interested in the total progeny up to the moment of expiration. 


5. EXTINCTION PROBABILITIES IN BRANCHING 
PROCESSES 


For a mathematical description of the process let X, represent the 
size of the nth generation. By assumption Xo = 1, and X; has the given 
probability distribution {p+} and generating function P(s) = prs. 
The second generation consists of the direct descendants of the X1 
members of the first generation; in other words, we consider X; as the 
sum of X, mutually independent variables each having the generating 
function P(s). By the theorem of section 1 the generating function of 


€R. A. Fisher, The genetical theory of natural selection, Oxford, 1930, pp. 738. 
6D. G. Kendall, Stochastic processes and population growth, Journal Royal 
Statistical Society, vol. 11 (1949), pp. 230-265. 
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X% is therefore P3(s) = P(P(s)). In like manner X; is the sum of X, 
variables each having the same distribution as Xp, and so the generating 
function of X; is P3(s) = P(Po(s)). By induction we see that in general 
the generating function Pn41(s) of the number Xn41 of particles in the 
(n-+1)st generation is defined recursively by 


(5.1) Pi(s) = P(s), — Pa4i(s) = P(P,(s)). 


In example (4.a) P(s) = q + ps”; and hence P2(s) = q+ pla + ps)”, 
P3(s) = q + p{q + p(q + ps”)"}™, etc. For a Poisson distribution 
P(g) = eG), Pale) = e DT eto. These formulas are not 
very pleasing but enable us to draw important conclusions. 

We seek the probability x, that the process terminates at or before 
the nth generation, that is zn = P{X, = 0} = P,(0). No extinction 
is possible when po = 0 and we shall therefore assume that 0 < Po <1. 
It is clear from its definition that z, increases with n. This can be seen 
analytically as follows. In the interval 0 < s < 1 the function P(s) is 
increasing and we have x; = P(0) = po. Therefore t = P(x) > 
> P(0) = zı and by induction Tn41 = P(ta) > P(£n—1) = tp. It fol- 
lows that the sequence zx, increases monotonically to a number ¢, and 
obviously ¢ satisfies the equation 


(5.2) RE): 


If u>O0 is an arbitrary root of the equation u = P(u), then 
zı = P(0) < P(u) = wand so by induction Eni = P(tn) < Plu) =u, 
which shows that ¢ < u. Accordingly, z» tends to the smallest positive 
root of (5.2). 

The graph of y = P(s) being convex, the curve and the bisector y=s 
can intersect in at most two points. They do intersect at the point 
(1,1) and therefore the equation (5.2) can have at most one root 
0 <{¢ <1. When such a root exists, the difference ratio {1 — PO 
/{1 — ¢} equals one, and by the mean value theorem there exists a 
point x lying between ¢ and 1 such that the derivative Pa) islet 
follows that a root ¢ < 1 of (5.2) can exist only if P’'(1) > 1. On the 
other hand, if P’(1) < 1 then {1 — P(s)}/{1 — s} <1 forall s < 1, 
and this implies P(s) > 1; the graph of P(s) lies above the bisector 
and hence (5.2) can have no root. This shows that a positive root 
£ < 1 of (5.2) exists if, and only if, P’(1) > 1, and that this root is 
unique. Now P’(1) = Zkp; is the expected number of direct descend- 
ants of each particle, and we can formulate the basic result: 


Let u = Zkp;, be the expected number of direct descendants of a single 
particle. If u < 1, then the probability tends to one that the process will 
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terminate before the nth generation (that is, X, = 0). If p> 1l, then 
there exists a unique root ¢ < 1 of (5.2), and ¢ is the limit of the prob- 
ability that the process terminates after finitely many generations. 


The difference 1 — ¢ can be called the probability of an infinitely 
prolonged process. Usually £n converges to ¢ rapidly, so that a ter- 
minating process is likely to proceed for only very few generations. In 
practice, therefore, ¢ is the probability of a rapid extinction. In exam- 
ple (4.c) we may call 1 —¢ the probability that a mutant gene estab- 
lishes itself. If we start with r particles instead of a single one, the 
probability that all r descendant lines die out is ¢, and the probability 
of at least one being successful is 1 — ¢". Even if ¢ is relatively large, 
1 — ¢" is near 1 if the initial number r is large. In the nuclear chain 
reaction of example (4.a) this is always the case, and hence we can say: 
If u > 1, the probability of an explosion is near 1, but for p < 1 the 
probability is 1 that the process stops after a finite number of genera- 
tions. 

We can also find the expected size of the nth generation E(Xn) = 
= P’,(1). Since P,(s) = P(Pn_i(s)), we find 


P'n(1) = P!(Paa(1))P’na(1) = P'O) P'a) = pEi), 
and generally by induction 
(5.3) E(X,) = p”. 


Hence, if » > 1, we should expect an exponential growth. This argu- 
ment can be amplified. It is easily seen that not only P,(0) — ¢ but 
also P,(s) — for all s<1. This means that the coefficients of 
s, $, s$, ... tend to zero. After a large number of generations the 
probability that no descendants exist is near £, and the probability that 
the number of descendants exceeds any preassigned bound is near 1 — $3 
it is exceedingly improbable to find a moderate number of descendants.” 


6. PROBLEMS FOR SOLUTION 


1. The distribution (1.1) has mean E(N)E(X) and variance E(N) Var(X) + 
+ Var(N) EXX). Verify this (a) using the generating function, (b) directly 
from the definition and the notion of conditional expectations. 

2. Animal trapping [example (1.b)]. If {gn} is a geometric distribution, so 
is the resulting distribution. If {gn} is a logarithmic distribution [cf. formula 
(2.4)], there results a logarithmic distribution with an added term. 


1 For the behavior of Xp see T. E. Harris, Branching processes, Annals of Mathe- 
matical Statistics, vol. 19 (1948), pp. 474-494. 


` 
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3. In N Bernoulli trials, where N is a random variable with a Poisson dis- 
tribution, the numbers of successes and failures are stochastically independent 
variables, Generalize this to the multinomial distribution (a) directly, (b) 
using multivariate generating functions. [Cf. example IX(1.d).] 

4. Randomization. Let N have a Poisson distribution with mean ), and 
let N balls be placed randomly into n cells. Show without calculation that 


the probability of finding exactly m cells empty is C) e™min(] — e)—nn-m, 


5. Continuation.* Show that when a fixed number r of balls is placed ran- 
domly into n cells the probability of finding exactly m cells empty equals the 
coefficient of e™òA"/r! in the expression above. (a) Discuss the connection 
with moment generating functions (problem XI, 24). (b) Use the result for an 
effortless derivation of formula II(11.7). 

6. Mixtures of probability distributions. Let {f;} and {g;} be two probabil- 
ity distributions, a > 0, £ > 0, «+8 =1. Then {af;+g,} is again a 
probability distribution. Discuss its meaning and the connection with the 
urn models of chapter V, section 2. Generalize to more than two distributions. 
Can such a mixture be a compound Poisson distribution? 

7. In the branching process prove that Var(X,+1) = « Var(X,) + u2"0?, 
using (a) generating functions, (b) conditional expectations. Conclude that 
Var(Xn) = o2(u2"-? + pM 4 pr), 

8. Continuation. If n > m show that E(X,X,) = u"-"E(X,,2). 

9. Continuation. Show that the bivariate generating function of Xm, X, is 
Pn(81Pn—m(82)). Use this to verify the assertion in 8. ; 


8 This elegant derivation of various combinatorial formulas by randomizing a 
parameter is due to C. Domb, On the use of a random parameter in combinatorial 
problems, Proceedings Royal Philosophical Society, Sec. A., vol. 65 (1952), pp. 
305-309. 


CHAPTER XIII 


Recurrent Events. 


The Renewal Equation 


1. INFORMAL PREPARATIONS AND EXAMPLES 


We shall be concerned with certain repetitive patterns connected 
with repeated trials. Roughly speaking, a pattern & qualifies for the 
following theory if after each occurrence of & the trials start from 
scratch in the sense that the trials following an occurrence of & form 
a replica of the whole experiment. The waiting times between succes- 
sive occurrences of & are mutually independent random variables hav- 
ing the same distribution. 

The simplest special case arises when & stands as abbreviation for 
“a success occurs” in a sequence of Bernoulli trials. The waiting time 
up to the first success has a geometric distribution; when the first suc- 
cess occurs, the trials start anew, and the number of trials between the 
rth and the (r+1)st success has the same geometric distribution. The 
waiting time up to the rth success is the sum of r independent variables 
{example IX(8.c)]. By contrast, suppose that people are sampled one 
by one and let & stand for “Two people in the sample have birthdays 
the same day of the year.” Here & is not repetitive; once it has oceurred 
it persists. The sampling may proceed until a second double birthday 
turns up, but this second phase is not a replica of the first one. The 
larger a sample, the greater the probability of a duplication of birth- 
days; therefore a long waiting time for the first double birthday prom- 
ises a short interval between the first and the second duplication. The 
two consecutive waiting times not only have different distributions but 
are stochastically dependent. Such waiting times are not the object 
of the theory of recurrent events. i - 

A phenomenon of a different type occurs when we are interested in 
the appearance of two consecutive successes In Bernoulli trials. The 

first occurrence of the pattern SS is well defined, but if & stands for “a 
run of exactly two successes,” the third trial may undo the second; if 
278 
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four successive trials produce the sequence SSSF, then & occurs at the 
second trial, but the whole sequence contains no 8. For us it is im- 
portant that the event “8 occurs at the nth trial” depends solely on 
the outcome of the first n trials and not on the future. 

A few typical problems to which the theory of recurrent events does 
apply are listed in the following ; 


Examples. (a) Success runs in Bernoulli trials. The term “success 
run of length 7” has been defined in several ways. It is largely a matter 
of convention and convenience whether a sequence of three consecutive 
successes is said to contain 0, 1, or 2 runs of length 2, and for different 
purposes different definitions have been adopted. However, if we are 
to use the theory of recurrent events, then the notion of runs of length 
r must be defined so that we start from scratch every time a run is 
completed. This means adopting the following definition. A sequence 
of n letters S and F contains as many runs of length r as there are non- 
overlapping uninterrupted successions of exactly r letters S. In a sequence 
of Bernoulli trials a run of length r occurs at the nth trial, if the nth trial 
adds a new run to the sequence. Thus in SSS|SF|SSS|SSS we have 
three runs of length 3, and they occur at trials number 3, 8, 11; there 
are five runs of length 2, and they occur at trials number 2, 4, 7, 9, 11. 
This definition has the advantage of a considerable simplification of 
the theory since runs of a fixed length become recurrent events. (This 
topic will be taken up in sections 7 and 8.) 

(b) A counter problem. Counters of the type used for cosmic rays 
and a-particles may -be described by the following simplified model. 
Bernoulli trials are performed at a uniform rate. A counter is designed 
to register successes, but the mechanism is locked for exactly r — 1 
trials following each registration. In other words, a success at the nth 
trial is registered if, and only if. no registration has occurred in the pre- 
ceding r — 1 trials. The counter is then locked at the conclusion of 
trials number n, ..., n +r — 1, and is freed at the conclusion of the 
(n + r)th trial provided this trial results in failure. The output of the 
counter represents dependent trials; each registration has an after- 
effect. However, whenever the counter is free (not locked) the situa- 
tion is exactly the same, and the trials start from scratch. Letting & 
stand for “at the conclusion of the trial the counter is free,” we have 
a typical recurrent pattern (cf. problems 9 and 10 and XV, 13). 

1 We are describing a discrete analogue of the so-called counters of type I. Type 
TI is described in problem 10. For a description see H. Maier-Leibnitz, Die Koin- 
zidenzmethode und ihre Anwendung auf kernphysikalische Probleme, Physikalische 


Zeitschrift, vol. 43 (1942), pp- 333-362. 
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(c) Return to the origin. In a sequence of Bernoulli trials with prob- 
ability p of success let & stand as an abbreviation for “The cumulative 
numbers of successes and failures are equal.” As we have done before, 
we describe Bernoulli trials in terms of independent variables {Xx} 


with the common distribution P{X, = 1} = p and P{X, = —1} = q, 
and put 
(1.1) So = 0, Sn = Xi + Xo+...+ Xn. 


Then S, is the accumulated excess of heads over tails, and our & occurs 
at the nth trial if, and only if, S, = 0. We shall describe & as the return 
to 0. Given that S, = 0, the subsequent partial sums 


(1.2) So = Sn, Ss, = Sadi, S’ = Santz, e.s. 


are subject to exactly the same probability relations as the original 
sequence {Sz}; and a return to O for {S’.} means a return to 0 for 
{S+} and vice versa. 

The event “& occurs for the first time at the nth tria!” alias “The 
Jirst return to the origin takes place at the nth trial” is defined as the 
aggregate of sequences {X;} such that 


(13) $40,  S:#0, a Sr¥0, S,=0. 


If this occurs we say that the. waiting time T equals n, and for the 
probability of (1.3) we write fn = P{T = n}. The first few terms are 
easily found by direct enumeration of all admissible sequences; clearly 
Jn =0 whenever n is odd and fy = 2pq, fa = 277", fe = 4p°q°, 
Js = 10p*g*, fio = 28p°q°. The. same sequence {fn} represents the 
probability distribution of the waiting time between the rth and the 
(r+1)st occurrence of £, and we call {fa} also the distribution of recur- 
rence times. (The distribution {f,} has been found in chapter XI, sec- 
tion 3, by the use of generating functions. In chapter III the special 
case p = q = 3 is treated, and the formulas apply in general since the 
number of outcomes satisfying (1.3) is independent of p. In the pres- 
ent chapter we give a new and independent derivation.) 

(d) Ladder points in Bernoulli trials. Adhering to the same notations 
we define a new repetitive pattern & by “8 occurs at the nth trial if Sn 
exceeds all preceding sums” that is, if 


(1.4) Sn > 0, S, > Si, Sp > So, exay Sa > Sn-1. 

In this case we shall say that the nth trial (or the index n) represents 
a ladder point. In the sequence of partial sums Sı, So, ... given by 
—1, 0, 1/21, 2, 3|2, 1, 2, 1, 2, 3, 4|5| (see figure 3 of chapter III) 
ladder points occur at the trials number 3, 4, 7, 14, 15, and the wait- 
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ing times between consecutive occurrences are 3, 1, 3, 7, 1. The rth 
occurrence of & can be described as the first occurrence of the value r, 
and therefore the ladder points may be described as moments of first 
passages. 

Tf & occurs at the nth trial the process starts from: scratch in the 
following sense. Assuming (1.4) to hold, a later trial number n + m is 
a ladder point if, and only if, 


(1.5) Snym > Sn, Sn4m > Sn4i, Sntm > Sn42; +++) Sntm > Sntm—1 
Put 
(1.6) S,* = Sntk -S = Xn4i Or Xnr- 


Then n + m is a ladder point for the sequence S1, S2, ... if, and only 
if, m is a ladder point for {S,*}. Clearly the operation defined in (1.6) 
produces an independent replica of the original sample space, and & 
qualifies for the theory of recurrent events. Note that in this case the 
sequence (1.2) as such is probabilistically different from the original 
sequence: after the rth occurrence of & the partial sums S; are bound 
to be close to r and not to 0. Nevertheless, as far as our pattern & is 
concerned, the trials following the occurrence of & start from scratch. 

(The ladder points provide a means of reducing the study of first- 
passage times to recurrent events, that is, to the summation of inde- 
pendent random variables. A direct (equivalent) approach is given 
in chapter IX, section 3. The notion of ladder points can be used prof- 
itably for sequences of arbitrary random variables, for example in 
connection with the general arc sine law.) 

(e) In a sequence of consecutive throws of a perfect die let & stand 
for “Ones, twos, ..., sixes appearcd in equal numbers.” Here the 
recurrent character of & requires no further comment. 


2. DEFINITIONS 


We consider a sequence of repeated trials with possible outcomes 
E; Gj =1, 2, e) They need not be independent (applications to 
Markov chains being of special interest). As usual, we suppose that 
it is in principle possible to continue the trials indefinitely, the prob- 
abilities P{E;,, Ej ---) Eja} being defined consistently for all finite 
sequences. Let & be an attribute of finite sequences; that is, we sup- 
pose that it is uniquely determined whether a sequence (E; ..., Ejn) 
has, or has not, the characteristic & We agree that the expression “‘& 
occurs at the nth place in the (finite or infinite) sequence Ejs Ejip ...” 
is an abbreviation for “The subsequence Ej, Bj, -+ E;, has the 
attribute 6.” This convention implies that the occurrence of & at the 
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nth trial depends solely on the outcome of the first n trials. It is also 
understood that when speaking of a “recurrent event 8,” we are really 
referring to a class of events defined by the property that & occurs. 
Clearly & itself is a label rather than an event. We are here abusing 
the language in the same way as is generally accepted in terms such 
as “a two-dimensional problem”; the problem itself is dimensionless. 


Definition 1. The atiribute & defines a recurrent event if: 

(a) In order that & occurs at the nth and the (n-+m)th place of the 
sequence (H;,, Ej,, ..., Ej,,,,) it is necessary and sufficient that & occurs 
at the last place in each of the two subsequences (Ejs Ej, ..., Ej) and 


Eino Bier i Diga): 
(b) Whenever this is the case we have 
P(E +++) Einm) = PlBiy +++) Big} PE -+ Lina): 


It has now an obvious meaning to say that & occurs in the sequence 
(Eis Ei ...) for the first time at the nth place, etc. It is also clear 
that with each recurrent event & there are associated the two sequences 
of numbers defined for n = 1, 2, ... as follows 


Un = P{8 occurs at the nth trial}, 
(2.1) 


Jn = P{& occurs for the first time at the nth trial}. 
It will be convenient to define 


(2.2) fo=0, uy =1, 
and to introduce the generating functions 


(2.3) Fs) = Vhs, Ul) = Silar, 
k=l 


k=0 


Observe that {uz} is not a probability distribution; in fact, in rep- 
resentative cases we shall have Zu, = 0, However, the events “& 


occurs for the first time at the nth trial” are mutually exclusive, and 
therefore 


(2.4) f= f= i 


n=l 


It is clear that 1 — f should be interpreted as the probability that & does 
not occur in an indefinitely prolonged sequence of trials. If f=1 we 
may introduce a random variable T with distribution 


(2.5) P{T =n} = fa. 


We shall use the same notation (2.5) even if f < 1. Then T is an im- 
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proper, or defective random variable, which with probability 1 — f does 
not assume a numerical value. (For our purposes we could assign to T 
the symbol ©, and it should be clear that no new rules are required.) 

The waiting time for &, that is, the number of trials up to and in- 
cluding the first occurrence of &, is a random variable with the dis- 
tribution (2.5); however, this random variable is really defined only 
in the space of infinite sequences (Ej, Ej, ---): 

By the definition of recurrent events the probability that & occurs 
for the first time at trial number k and for the second time at the nth 
trial equels frfa- Therefore the probability f® that & occurs for 
the second time at the nth trial equals 


(2.6) JË = fifa + Sofa- +--+ fafi- 


The right side is the convolution of {fa} with itself and therefore 
{f} represents the probability distribution of the sum of two inde- 
pendent random variables each having the distribution (2.5). More 
generally, if fY’ is the probability that the rth occurrence of & takes 
place at the nth trial we have 


(2.7) JO = fsa? + of +... + haa sl. 


This simple fact is expressed in the 

Theorem. Let f® be the probability that the rth occurrence of & takes 
place at the nth trial. Then {f©} is the probability distribution of the sum 
(2.8) TO =T,4+T2+...+T, 


of r independent random variables Tı, ---, T, each having the distribution 
(2.5). In other words: For fixed r the sequence {JP} has the generating 
function F"(s). 

It follows in particular that 


o 


(2.9) Die = FO =f: 


n=l 


the probability that & occurs at least r times equals f” (a fact which 
could have been anticipated). We now introduce 


Definition 2. A recurrent event & will be called persistent? if f = 1 
ond iransient if f < 1. 


2 In the first edition the terms certain and uncertain were used, but the present 
terminology is preferable in applications to Markov chains. 
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For a transient & the probability that it occurs more than r times 
tends to zero, whereas for a persistent & this probability remains unity. 
This can be described by saying with probability one: A persistent & is 
bound to occur infinitely often whereas a transient & occurs only a finite 
number of times. (This statement not only is a description but is for- 
mally correct if interpreted in the sample space of infinite sequences 
Ein Ei...) 

We require one more definition. In Bernoulli trials a return to the 
origin [example (1.c)] can occur only at an even-numbered trial. In 
this case fon, = U2n41 = 0, and the generating functions F(s) and 
U(s) are power series in s? rather than s. Similarly, in example (1.e) 
& can occur only at trials number 6, 12, 18, .... We express this by 
saying that & is periodic, Such recurrent events have a great nuisance 
value; in each instance the situation is quite obvious, but all general 
theorems require mention of the nominally special case of periodicity. 


Definition 3. The recurrent event £ is called periodic if there exists 
an integer A > 1 such that & can occur only at trials number d, 2), 8d, ..« 


(i.e, Un = O whenever n is not divisible by X). The greatest with this 
property is called the period of &. 


In conclusion let us remark 


quences E;,, Ej, ... the number of trials between the (r—1)st and the 


designed to reduce a fairly general situation to 
random variables. Conversely, an arbitrary p 


Wal, =], 2, ... may be used to define a recu 
this assertion by the 


sums of independent 
robability distribution 
rrent event. We prove 


Example. Self-renewing aggregates. Consider an electric bulb, fuse, 
or other piece of equipment with a finite life Span. As soon as the 
piece fails, it is replaced by a new piece of like kind, which in due time 
is replaced by a third piece, and so on. We assume that the life span 
is a random variable which ranges only over multiples of a unit time 
interval (year, day, or second). Each time unit then represents a trial 
with possible outcomes “replacement” and “no replacement.” The 
successive replacements may be treated as recurrent events. If f, is 
the probability that a new piece will serve for exactly n time units, 
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then {fn} is the distribution of the recurrence times. When it is cer- 
tain that the life span is finite, then Ef, = 1 and the recurrent event 
is persistent. Usually it is known that the life span cannot exceed a 
fixed number m, in which case the generating function F(s) is a poly- 
nomial of a degree not exceeding m. In applications we desire the 
probability un that a replacement takes place at time n. This un may 
be calculated from equation (3.1). Here we have a class of recurrent 
events defined solely in terms of an arbitrary distribution {f,}. The 
case f < 1 is not excluded, 1 — f being the probability of an eternal 
life of our piece of equipment. 


3. THE BASIC RELATIONS 


We adhere to the notations (2.2)-(2.4) and propose to investigate 
the connection between the {fn} and the {un}. The probability that 
& occurs for the first time at trial number »v and then again at a later 
trial n > v is, by definition, fyun—». The probability that & occurs at 
the nth trial for the first time is fn = fnuo. Since these cases are 
mutually exclusive we have 


(3.1) Un = fitin—1 + Joun +... + frto, Ta SAU 
At the right we recognize the convolution {f,}+*{u,} with the generating 
function F(s) U(s). At the left we find the sequence {un} with the 
term uo missing, so that its generating function is U(s) — 1. Thus 
U(s) — 1 = F(s) U(s), and we have proved 

Theorem 1. The generating functions of {un} and {fn} are related by 


(3.2) U(s) = 1-f@ 


Note. The right side in (3.2) can be expanded into a geometric series 
ZF” (s) converging for |s| <1. The coefficient f{? of s” in F"(s) being 
the probability that the rth occurrence of & takes place at the nth 
trial, equation (3.2) is equivalent to 
(8.3) Un = JP HIP Han 
and expresses the obvious fact that if & occurs at the nth trial, it has 
previously occurred 0, 1, 2, ..., n—1 times. (Clearly ff? = 0 for 
r>n.) 

Theorem 2. For & to be transient, it is necessary and sufficient thet 


(3.4) u= Dy 


j=0 
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is finite. In this case the probability f that & ever occurs is given by 
u—i 


(3.5) f= 
u 
Note. We can interpret u; as the expectation of a random variable 
which equals 1 or 0 according to whether & does or does not occur at 
the jth trial. Hence vı + uz +...-+ u, is the expected number of 
occurrences of & in n trials, and u — 1 can be interpreted as the ex- 
pected number of occurrences of & in infinitely many trials. 


Proof. The coefficients u+ being non-negative, it is clear that U(s) 
increases monotonically as s — 1 and that for each N 


N % 
È un < lim U(s) < E un = u. 
n=O zal n=O 
Since U(s) — (1 —f)— when f < 1 and U(s) — œ when f = 1, the 
theorem follows. 


The next theorem is of particular importance® The proof is of an 
elementary nature, but since it does not contribute to a probabilistic 


understanding we defer it to the end of the chapter. (See, however, 
problem 1.) 


Theorem 3. Let & be persistent and not periodic and denote by p the 
mean of the recurrence times T,, that is, 


(3.6) n= if; = F'(1) 
(possibly u = ©). Then 
(3.7) Un —> po 


asn — © (un — 0 if the mean recurrence time is infinite). 


3 P. Erdés, W. Feller, and H. Pollard, A theorem on Power series, Bulletin of the 
American Mathematical Society, vol. 55 (1949), Pp. 201-204. This theorem was 
conjectured and proved for the purpose of obtaining a better access to ergodic 
properties of infinite Markov chains established by Kolmogorov. After the ap- 
pearance of the first edition it was observed by K. L. Chung that theorem 3 is 
really equivalent to the ergodic theorem of Kolmogorov and could be deduced from 
it. Previously a great many papers were devoted to various special cases and vari- 
ants. Later, theorem 3 was generalized to continuous random variables and made 
more precise in various ways by Blackwell, Chung, Erdés, and Wolfowitz. Black- 
well gave an elegant simple proof that (3.7) holds for all integral-valued random 
variables (not necessarily positive ones as in the text), provided they have a posi- 
tive mean, His method is based on the use of ladder points for arbitrary variables 
[cf. example (1.d)]. See D. Blackwell, Extension of a renewal theorem, Pacific 
Journal of Mathematics, vol. 3 (1953), pp. 315-320. 
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Theorem 4. If & is persistent and has period à > 0, thenasn —> œ 
(3.8) Una —> ut 


and u; = 0 for every k not divisible by d. 

Proof. Since & has periad A, the series F(s) = 2f,8" contains only 
powers of s*, and so F(s'/*) = F,(s) where F;(s) is again a power series 
with positive coefficients, and F;(1) = 1. Theorem 3 implies that the 
coefficients of Ui(s) = {1 — F1(s)}~? tend to 4, —* where 


m = F'i(1) = AF’) = Ag. 


(Clearly u and p are either both finite or both infinite.) Now 
U(s) = U;(s») and so (3.8) holds. 

Examples. (a) For a trite example let & stand for “success” in 
Bernoulli ‘trials. Then un = p, by the very definition. Theorem 2 
states that the expected number of trials between two consecutive suc- 
cesses is p—?. Here U(s) = 1+ ps(1 — 8) = (1 — gs)(1 — 8)“, 
and from theorem 1 we conclude that F(s) = ps(1 — gs)~*, showing 
that the waiting time between consecutive successes has a geometric 
distribution. 

(b) Return to the origin in Bernoulli trials [ecample (1.c)]. If at the 
kth trial the cumulative numbers: of successes and failures are equal, 
then k must be an even number, k = 2n, and n trials must have resulted 
in success, the other n in failure. Therefore we have for the probability 
of an equalization 

2m 
(3.9) Uan = ( \ pre. 

n 
We know from the normal approximation to the binomial distribution, 
and we can also readily verify using Stirling’s formula, that 


À E 
(3.10) 7 Gn 
so that 

say „~ D" 
(8:11) Ugn (anji 


the sign ~ indicating that the ratio of the two sides tends to unity. 

If p # 3, then 4pg < 1, and Dug, converges faster than the geometric 
series with ratio 4pg. If p= 3, then usn ~ (an); hence Zuz di- 
verges, but usn —> 0. Our theorems permit the conclusion that with 
prohability one the following is true: 
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If p # q, then the cumulative sums S,, will vanish only finitely many 
times. If p = q = 3, they wili pass through O infinitely often, but th 
mean recurrence time is infinite. 4 

In the case p ¥ q the assertion is obvious intuitively and follows 
also from the strong law of large numbers. In gambling language, if 
the game is favorable for Peter, he can rest assured that after a few 
initial fluctuations his net gain will be positive and remain so. When 
p = q = $, the situation is much less intuitive and is the source of the 
paradoxical features of the fluctuations in coin tossing described in 
chapter III, section 7. 

The theorems above can supply additional information. Using th^ 
readily verified formula 


o> OG 


and the binomial expansion II(8.7), we get from (3.9) 


a 
(3.13) U(s) = Do uzn” = (1 — 4pg?) =. 
n=0 
If p = 3, then u = U(1) = (1 — 4pq)- =|p — g|". From (3.5) 
we conclude that the probability f that the accumulated numbers of suc- 
cesses and failures will ever equalize is given by 


CHE A= i loa]: 


(This is the probability of at least one return to the origin.) 
From (3.2) we get for the generating function of the recurrence times 


(3.15) F(s) = 1 — (1 — 4pgqs*)}, 
This formula is most interesting in the case p=q= 4. Then 
(3.16) F(s) = 1 — (1 — ê} 


and the binomial expansion shows that 


(3.17) fon = (=) t P Tia = 2) Byes 


n\n-1 


(fn vanishes whenever n is odd), Equation (3.17) gives the distribution 
of the recurrence times for the return to the origin in the classical coin- 
tossing game. 

(We have obtained this formula by different methods in chapter III, 
section 4, and chapter XI, section 3. The present method, although 
not the most elementary, is the most straightforward.) 
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(c) Ties in multiple coin games. We consider repeated independent 
tossings of two coins and say that & has occurred. whenever the accu- 
mulated number of heads (and therefore of tails) is the same for both 
coins. Clearly 


an na 


Using II(12.11) and (3.10), we find that 
2: 1 
(3.19) ti, = ( D 2m2 oa. 


Hence Zun diverges, but un’*— 0. Therefore & is persistent but has 
infinite mean recurrence time. 

More generally, consider the simultaneous tossing of r coins, and let 
& stand for the recurrent event that all r coins are in the same phase 
(accumulated numbers of heads are the same for all coins). Then 


am eiO) 


To estimate un note that the maximal term of the binomial distribution 
©) 2 is smaller than n™. Therefore 


aay mor (a)r C] 


Accordingly Zu, converges if r > 4. For r= 2 we saw that Eus 
diverges. A special consideration is necessary for the case r = 3. 
From the normal approximation to the binomial distribution we know 
that for sufficiently large n and values of k lying between $n — nt and 


n A ae 
dn + ni we have W) 2 > cn™?, where c is a positive constant (say 


e—*). Therefore, when r = 3, 
(8.22) un > m(n) = 28/n, 
and hence Zun diverges. In other words, the recurrent event & that k 
coins show the same cumulative numbers of heads is persistent if, and only 
if, k <3. The mean recurrence time is infinite in each case. 

(d) Dice. In example (1.e) we considered the recurrent event & that 
the accumulated numbers of aces, twos, threes, etc., are equal. Obvi- 
ously & has period 6 and usn = (6n)!(n!)~°6-*"._ Using Stirling’s for- 
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mula, we readily find that usn is of the order of magnitude n~!, so that 
Zun converges. Hence & is transient. From (3.7) it is easy to calcu- 
late that the probability of a recurrence is about 0.022. 

(e) For applications to the theory of runs see sections 7 and 8. 


4. THE RENEWAL EQUATION 


The basic equation (3.1) of the theory of recurrent events is a special 
case of the so-called renewal equation (4.1), which is encountered in 
many different connections. We proceed to show that the theorems 
of the last section apply without essential modification to this more 
general equation. The discussion will be of a purely analytic character, 
probabilistic interpretations and applications being reserved for the 
next section. i 

Let {an} and {bn} be two sequences such that 0 < a, < l and bn >0 


(where n = 0, 1, 2, ...). A third sequence {un} is defined by the recur- 
sive relations 


(4.1) Un = bn + (doun + arun +... + anuo) 
or 
(4.2) {un} = {bn} + {an} fun}. 


Solving (4.1) successively, we get 
Uo = bo/(1 — ao), u, = (bi + ayup)/(1 — ao), . 


so that no problem about the existence of a unique solution {un} arises. 
We are interested in the behavior of {un} as n — o, a problem to 
which a great number of papers (mostly of controversial nature) have 
been devoted. 

Setting bn = 0, an = fa forn = 1,2,... and by = 1, do = 0 reduces 
equation (4.1) to (3.1). Formally, therefore, the renewal equation (4.1) 
is more general, but we shall derive its properties from those of (3.1). 
Once more we introduce the generating functions 


(4.3) A(s) = Zans", B(s) = Zbas”,  U(8s) = Tugs”. 


The coefficients a, and b, being bounded, the first two series converge 
at least for |s|< 1; the convergence of the last series will presently 
become evident. Equation (4.1) can now be rewritten in the form 
U(s) = B(s) + A(s)U(s) or 


(4,4) Hie 
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For B(s) = 1 this reduces to (8.2) with the essential difference that 
now fa„} is not necessarily the distribution of @ recurrence time, so 
that A(s) can be larger as well as smaller than i, 

We shall say that we have the periodic case if there exists an integer 
à > 1, such that all a; except, perhaps, ay, a2), a3,, ... vanish. Then 
A (s) is a power series in s>. The largest integer A with the said property 
is called the period. 


Theorem 1. Suppose that {an} is not periodic and that B(1) = Zb, 


is finite. 
(a) If Za, = 1, then 
(4.5) Un —> B(l)u* where p= Enap. 


(In particular, un — 0 if Enan diverges.) 

(b) If Zan < 1, then the series 
(4.6) Zun = B(1){1 — A(1)} 
converges. 

(c) If Zan > 1 and also if the series diverges, then there exists a unique 
positive root x < 1 of the equation A(z) =1. In this case 

Ba) 

(4.7) Un “TO DDAA 
the sign ~ indicating that the ratio of the two sides tends to unity. 
(Relation (4.7) implies that u, increases geometrically; the derivative 
A’(z) is finite since A(s) is regular for |s| < 1.) 

Proof. (a) If v, is the coefficient of s” in {1 — A(s)}—+, then 
Vn — uw by theorem 3 of the last section. Now 


(4.8) Un = vabo + nibi +... + Uda. 

For every fixed k the term v,_xb; tends to b/n asn — o. Moreover, 
the va are bounded. It follows that, for N sufficiently large, un differs 
arbitrarily little from 

(4.9) Wn = vabo + vabi +...+ va_nby, 

and wu’, — (bo +...+ by)/u which in turn differs arbitrarily little 


from B(1)/p. 

(b) Here the proof of theorem 2, section 3, applies without modifica- 
tion, 

(c) Here it suffices to apply the result under (a) to the sequences 
laaz"), {baz}, and {usz"} which have the generating functions A(z), 
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B(zs), and U(zs), and which are obviously related in the same way 88 
the original sequences. 

Unfortunately completeness requires & special mention of periodic 
sequences {an} where A(s) = Za,,8 is a power seriés ins’. In this case 
we divide the coefficients Un into groups of equal phase, {uo, Ua, Uzr, 
ua <- -hs {Un UH UAH, UIAH +++ }, ee., {i—i Uni Uai -e 
Tt is obvious from (4.4) that the coefficients una depend only on bo, 
by, baa, --- but not on the by with k not divisible by à. This leads us 
to represent U(s) and B(s) as the sum of à power series in $ 


(4.10) U(s) = Uols) + 8Ui(s) +..-+ Uls) 
B(s) = Bo(s) + 8B1(8) +.. -+ ®"Byals), 


where 
(4.11) Ue) = E umaj, B8) = D bass”. 
n=0 n=0 
Then, from (4.4) for j = 0, 1, ..., 4-1, 
; B;(8) 
4.12 Tl = ————. 
(4.12) i= 


Here all functions are power series in 3, and the preceding theorem 
applies after the change of variables s =t. This leads to 
Theorem 2. In the periodic case with period d the sequence {tn} 28 
serpnesently periodic; if A(1) = 1, each of the d subsequences {umr+i} 
a limit 


(4.18) i eea 


where B;(1) = bj + bx45 + bagi + bargi +- 

Example. Repeated averaging. Given three positive numbers u1, 
Ug, Us, define an infinite sequence {un} by taking running arithmetic 
means 
(4.14) u4 = FQ + uz + w), us = $(uz + us + u), 

Un43 = $ (Un + Ungi + Unga), - 


We seek information concerning the asymptotic behavior of {un}- 
More precisely, we propose to show that 


(4.15) Un —> lu, + 2uz + 3u). 
Needless to say, the same argument will apply to arbitrary means (cf. 
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problems 5 and XV, 15). The point is that problems of this type are 
reducible to the renewal equation (4.1) and throw a new light on its 
nature. 

If we put 


(4.16) a= 0, a, = a2 = a3 = 3, a,=b,=0 forn>4, 


then (4.14) and (4.1) agree for n 2 4. To reduce (4.14) to (4.1) for 
all n we have to define bo = uo = 0 and determine bj, b2, b3 from 


(4.17) b =u,  ba=u— 3, bs = Us — (ur + ua). 


Now we can apply theorem 1(a) to obtain (4.15) without further cal- 
culations, Since the generating function U(s) is rational, we can ex- 
pand it into partial fractions to see that the limit in (4.15) is approached 
with exponential rapidity and to estimate the difference of the two 


sides. 
5. DELAYED RECURRENT EVENTS 


We shall now introduce a slight extension of the notion of recurrent 
events which is so obvious that it could pass without special mention, 
except that it is convenient to have a term for it and to have the basic 
equations on record. 

Perhaps the best informal description of delayed recurrent events is 
to say that they refer to trials where we have “missed the beginning 
and start in the middle.” The waiting time up to the first occurrence 
of & has a distribution {bn} different from the distribution {fn} of the 
recurrence times between the following occurrences of &. The theory 
applies without change except that the trials following each occurrence 
of & are exact replicas of a fixed sample space which is not identical 
with the original one. 

The situation being so simple, we shall forego formalities and agree 
to speak of a delayed recurrent & when the definition of recurrent events 
applies only if the trials leading up to the first occurrence of & are disre- 
garded; it is understood that the waiting time up to the first appearance of 
& is a random variable independent of the following recurrence times, al- 
though its distribution {bn} may be different from the common distribution 


{fn} of the recurrence times. 
It is easy to calculate the probabilities u, of the occurrence of & at 


the nth trial directly from the definition above and the results of sec- 
tion 3. However, it is preferable to proceed independently and to write 
down a new equation of the renewal type. 

The probability that & occurs at trial number n — k and the next time 
at the nth trial equals un—zfr. These events are mutually exclusive, 
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and their union for k = 1, 2, ..., n—1 is the event that & occurs at 
the nth trial and at some previous trial. The probability that & occurs 
at the nth trial for the first time equals ba, and hence for n > 1 


(5.1) Un = bn + Unifi + Un—ofe +...+ Ufar. 
For the delayed events it is most natural to set 
(5.2) Uo = fo = bo = 0; 
this reduces (5.1) to the renewal equation 
(5.3) {un} = {bn} + [unje {fa}, 
and the corresponding generating functions satisfy 
B(s 
(5.4) U(s) = or 


The results of the last section now contain as a special case the 
Theorem. If& is not periodic and if Zfn = 1 (that is, & is persistent), 
then 


(5.5) Un —> u yb, B= Infa. 
If f= df <1 (that is, & is transient), then 
(5.6) Zun = (1 — f)! Eb,. 


In the periodic case theorem 2 of section 4 applies. 

Examples. (a) In the counter problem (1.b) suppose that at time 0 
the counter was locked fi 
observations begin two trials after a registration). 
locked for at least r — 2 additional units and bec 


that b,_» = q, bo» = Pg, bare = pg, ... 


(b) Self-renewing aggregates. In the example of section 2 we have 
considered a piece of equipmen 


with distribution {fn}. When it expires, it is immediate. 
a new piece, and the process continues in this way, & standing for 
“replacement at time n.” In section 2 we assumed that at time 0 a 


and we have to calculate the probability distribution {bn} of the waiting 
time for the first replacement. Clearly bẹ is the probability that a piece 
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of equipment will expire at age n + k, given that it has attained age k. 
Thus 


(5.7) ba = fave te = Se + Sea + Sere oe uae 


Tk 


In applications it is not natural to consider just one piece of equip- 
ment but a whole population. Suppose then that the initial population 
(at time 0) consists of N elements, among which exactly v, are of age k 
(where 2», = N). Each element originates a line of descendants, and 
at any time n there is a certain probability that a replacement is re- 
quired in this line. The sum of these probabilities for all W elements 
is the expected number un of replacements at time n. Obviously un satis- 
fies the basic equation (5.3) with 


(5.8) = ee 
k=O Tk 


and our theorems show that un will converge. 

It is easy to calculate not only the limit of u, but also the age dis- 
tribution at time n and its asymptotic behavior. Let ve(n) be the ex- 
pected number of elements of age k at time n (so that »,(0) = vy). Clearly 


o,(n) = Un—ere if k<n, 
(5.9) 
v; 7! 
v(n) = Pitk if k>n. 
Tk—n 


In the non-periodic case we know that un — B(1)/p = N/pasn > ©, 
and it follows from (5.9) that v(m) —> Nrz/p. Hence, in the non- 
periodic case, there is a stable limiting age distribution: In the limit 
the expected number of elements of age k is Nrz/u, where N is the 
(constant) population size, and u = =r, the mean duration of life (if 
u = ©, then the population ages indefinitely), The basic fact is that 
the limiting age distribution is independent of the initial age distribution 
and depends only on the mortality distribution {a,} (cf. problems 17 
and 18). 

As 4 numerical illustration consider a population of N = 1000 ele- 
ments with the initial age distribution vo = 500, vı = 320, vz = 74, 
vg = 100, vg = 6. Assume the survival probabilities fı = 0.20, 
fo = 0.43, fs = 0.17, f4 = 0.17, fs = 0.03 (so that 5 is the maximal 
age). Here U(s) is a rational function, 

397 + 332s + 159s? + 975° + 15s* 


(5:10) UO) = 8p as — 0.436 — 0.178 — 017 — 010355" 
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and can be expanded into partial fractions, 


3(1— s) 61(1+3s/5)  87(1+8/5)  5307(1 + s?/4) 


The age distributions {v,(n)} for n = 1, 2, 3, ... may be calculated 
directly from the renewal equation. The columns of table 1 give these 


TABLE 1 


156.8 | 153.3 | 154.2 
82:4 | 84.8] 83.3 
12.3 | 12.4 | 12.5 


age distributions {v,(n)} together with the limiting distribution and 
show that the approach to the limit is not monotonic. 

(c) Population theory. This theory is analogous to renewal theory, 
except that the population size is variable and female births play the 
role of replacements. The essential novelty is that a mother can have 
zero, one, or more daughters, so that lines may become extinct or 
branch. We now define a, as the probability that a newborn female 
will survive and at age n give birth to a female child (the dependence 
on the number and ages of previous children is neglected). Then Zan 
is the expected number of daughters, and hence all three possibilities 


Zan < l, Zan = 1, Za, > 1 are now possible. The preceding argu- 
ment applies with this obvious modification. 


6. THE NUMBER OF OCCURRENCES OF & 

Up to now we have considered the first, second, ..., rth occurrence 
of a recurrent event & and taken the number of trials as a random 
variable. Often it is more natural to take the opposite point of view, 
namely to fix the number n of trials and to consider the number Nn of 


occurrences of & in n trials as a random variable. We shall investigate 
the asymptotic behavior of N, for large 7. 
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As in (2.8) let T stand for the number of trials up to and including 
the rth occurrence of &. The probability distributions of TO and N, 
are related by the obvious identity 


(6.1) PIN, > 7} = P{T® <n}. 

We begin with the simple case where & is persistent and the distri- 
bution {fn} of its recurrence times has finite mean p and variance o°. 
Since T is the sum of r independent variables, the central limit 
theorem (chapter X, section 1) asserts that for each fixed x asr —> œ 


P = — rp 


(6.2) == z} — &(z) 


or’ 


where (x) is the normal distribution function. Now let n — © and 
r — œ in such a way that 


(6.3) 2 -2 > z; 
or 
then (6.1) and (6.2) together lead to 
(6.4) P{N, = r} > &(). 


To write this relation in a more familiar form we introduce the re- 
duced variable 


Nn — mm 
1G ee, 
(6.5) N,,* = a 
The inequality Na > 7 takes on the form 
i jm i 
T — ny n— re fre 
* —- = =: OE i 
(6.6) N,* > oat a (*) 


and (6.3) shows that the right side tends to —z. Thus 
(6.7) P{N,* > —z} > @@) or P{N,* < —z} > 1 — &@), 
and we bave proved the 


Theorem 1. Normal approximation. If the recurrent event & is per- 
sistent and its recurrence times have finite mean u and variance a?, then 
both the number T of trials up to the rth occurrence of & and the number 
Nn of occurrences of & in the first n trials are asymptotically normally 
distributed as indicated in (6.2) and (6.7). 


Note that in (6.7) we have the central limit theorem applied to a 
sequence of dependent variables Na. The relations (6.7) make it plausi- 
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ble that 


(68) ENDT Varm) ~~ 


but an exact proof requires an additional argument. 
The usefulness of theorem 1 will be illustrated by an application to 
the theory of runs in the next section. However, it should be borne 


Theorem 2. Recurrence paradox. Let & be the return to the origin in 
symmetric Bernoulli trials (coin tossing). The expected number ENan) 
of occurrences of & in 2n trials is given by 


(6.9) Ean) = (2n +1) (™) 2-2 1 
so that 

(6.10) : Een) ~ 2(n/7)è 

(and E(N,) is of the order of magnitu 


de n? instead of increasing linearly 
with n). 


Proof. Recalling formula XI (1.8) we may calculate E(N,) from the 
“tails” in (6.1) to obtain 


(G1) EB) = EPIN, >ja EPT. cy), 


r=æl 


The generating function of T ig Fr (8) where F(s) was foun 
to be F(s) = 1 — a ELA By th 


4 W. Feller, Fluctuation theory of recurrent events, 7; k i 
Mathematical Society, vol. 67 (1949), pp. ggqjg > Transactions of the American 


ME a 


wee 
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ing function 

F(s) 1+s 1 
6.12 Pre ee re . 
a eee (i-—s(l1-F(s)) (—s)! 1-e 
It follows that 
(6.18) ENa) = E(Nony1) = (—1)" (9 -4 


and this has been rewritten in the form (6.9) for convenience [using 
11(12.5)]. 

The curious implications of this theorem have been discussed at 
length in chapter III, section 6. Theorem 2 of that section shows that 
N,,-7~” has an asymptotic distribution given by the positive part of 
the normal distribution. The normalization N,n~* stands in sharp 
contrast to that of theorem 1 above. 


*7, APPLICATION TO THE THEORY OF SUCCESS RUNS 


In the sequel r will denote a fixed positive integer and & will stand 
for the occurrence of a success run of length r in a sequence of Bernoulli 
trials. It is important that the length of a run be defined as stated in 
example (1.a), for otherwise runs are not recurrent events, and the 
calculations become more involved. As in (2.1) and (2.2), un zs the 
probability of & at the nth trial, and fn is the probability that the first run 
of length r occurs at the nth trial. 

The probability that the r trials number n, n—1, n—2, ...,n—r+1 
result in success is obviously p”. In this case & occurs at one among 
these r trials; the probability that & occurs at the trial number n — k 
(k = 0, 1, ..., 7-1) and the following k trials result in success is 
Un—xp*. Since these r possibilities are mutually exclusive, we get the 
following recurrence relation: € 


(7.1) Un + Unrp +... Unrp p = p. 

This equation is valid for n > r. Clearly 

(7.2) wy = Ug =... = Uy = 0, w= 1, 

Now multiply (7.1) by s* and sum over n = r, r+1, r+2,.... In 


e Sections 7 and 8 treat a special topic and may be omitted. 

6 The classical approach consists in deriving a recurrence relation for fn. This 
method is more complicated and does not apply to, say, runs of either kind or 
patterns like SSFFSS, to which our method applies without change [cf. example 


(8.0)]. 
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view of (7.2) we get on the left side 
(7.3) {U(s) — 1} + ps + pP? +...4 ps) 


and on the right side p(s" -+s+ -+...). Summing the two geo- 
metric series, we find 


(7.4) {U(@®) — 1} 
or 


(7.5) U(s) = 


1 EN (ps)" ps” 
“Tape 1s 
1— 8+ gpt 
(ERETZA 
Using equation (3.2), we get for the generating function of the recurrence 
times 


p's (1 — ps) prs 

(7.6) F(s) Ts + ger = 1 — gs(1 + ps +... + pos) 

The fact that F(1) = 1 shows that in a prolonged sequence of trials 
the number of runs of any length is certain to increase over all bounds. 
The mean recurrence time » can be obtained directly from (7.1) since 
we know that u, — y-!. If we require also the variance, it is prefer- 
able to calculate the derivatives of F(s). This is best done by implicit 
differentiation after clearing (7.6) of the denominator. An easy cal- 
culation then shows that the mean and variance of the recurrence times 


of runs of length r are 
1—p 1 2r+1 

an ne nea 

ap (@") ap la 

respectively. Theorem 1 of the last section implies that for large n 

the number Nn of rune of length r produced in n trials 


ts approximately 
normally distributed, that is, for fixed « < 8 the probability that 
n ar n Bom 
(7.8). Ie a <a <-+—- 
u E B p 
tends to (8) 


— (a). This fact was first proved by von Mises, but 


TABLE 2 


Maan Recurrence Times ror Success Runs iF TRIALS ARE 
PERFORMED AT THE RATE or ONE PER SECOND 


Length of Run P = 0.6 P = 0.5 (Coins) p = 4 (Dice) 
r= 5 30.7 seconds 1 minute 2.6 hours 
10 6.9 minutes 34.1 minutes 28.0 months 
15 1.5 hours 18.2 hours 18,098 years > 
20 19 hours 24.3 days 140.7 million years 
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without the theory of recurrent events the proof requires rather 
lengthy calculations. Table 2 gives a few typical means of recurrence 
times. 

The method of partial fractions of chapter XI, section 4, permits us 
to derive excellent approximations. The second representation in (7.6) 
shows clearly that the denominator hes a unique positive root § = x. 
For every real or imaginary number with |s| < z we have 


(7.9) |qs(1 + ps +...+ p—s] < 
< gll + pr +... +p) 


where the equality sign is possible only if all terms on the left have 
the same argument, that is, if s = z. Hence z is smaller in absolute 
value than any other root of the denominstor in (7.6). We can, 
therefore, apply formulas (4.5) and (4.9) of chapter XI with sı = =z. 
The coefficient pı is easily computed with U(s) = p’s"(1 — ps) and 
V(s) =1—s-+aqp's"*. We find, using that V(s) = 0, 


(@—-1—-pz) 1 
(Fti~rz)g xt 


The probability of no run inn trials is qn = fayı + fago + fns t.. 
Equation (7.10) approximates gn by a geometric series, and we get 


(7.10) I~ 


1 — pr 1 
7.11 2S E 
set) a @+1l—rag ot 


We have thus found that the probability of no success run of length 
r in n trials is, asymptotically, given by (7.11). Table 3 shows that 
TABLE 3 


Prosapitity or Havne No Success Run or LENGTE 7 = 2 N n 
TRULS WITH p = $ 


Approxima- 
n qn Exact tion (7.11) Error 
2 0.75 0.76631 0.0163 
3 625 -61996 -0080 
4 -500 -50156 -0016 
5 40625 -40577 -0005 


the formula gives surprisingly good approximations even for very small 
n, and the approximation improves rapidly with n. This illustrates 
the power of the method of generating function and partial fractions. 
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Numerical Calculations. For the benefit of the practical-minded reader we 
use this occasion to show that the numerical calculations involved in partial frac- 
tion expansions are often less formidable than they appear at first sight, and that 
excellent estimates of the error can be obtained. 

The asymptotic expansion (7.11) raises two questions: first, the contribution 
of the r — 1 neglected roots must be estimated, and second, the dominant root z 
must be evaluated. 


The first representation in (7.6) shows that all roots of the denominator of F(s) 
satisfy the equation 


(7.12) 8 = 1 + gps +, 


although (7.12) has the additional extraneous root s = p-!, For positive s the 
graph of f(s) = 1 + gp’s"+ jg convex; it intersects the bisector y = s at x and p~! 
and in the interval between z and p™ the graph lies below the bisector. Further- 
more, f'(p) = (r + 1)q. If this quantity exceeds unity, the graph of f(s) crosses 


the bisector at s = p from below, and hence p> -z. To fix ideas we shall assume 
that 


(7.18) (+0¢>1; 


in this case z < p7} and f(s) < sforz <3 < p~. It follows that for all complex 
numbers s such that z < [s| < p~ we have IfO1 < Klel) < |s| so that no root 
Ea 


8x can lie in the annulus z < la| < p71. Since z was chosen as the root smallest 
in absolute value, this implies that 


(7.14) ls] > po 
for each root s, <x. By differentiation of (7.12) it is now seen that all roots are 
simple, 

The contribution of each root to gn is of the sam 
of the dominant root z, and therefore the r—1 te: 
form 

ps —1 1 
7.15; Ay = — m.. 
a) 2 rs — (r +1) gpt 


e form as the contribution (7.11) 
rms neglected in (7.11) are of the 


We require an upper bound for the first fraction on the 


right. For that purpose 
note that for fixed s > p 


ps +1 % 
(10) rse? = (r +1) Tre +r +1’ 


6 = 0 and 9 =z, and a direct substitution shows that 0 corresponds to a mini- 
mum, 7 to a maximum. In view of (7.13) and (7.14) we have then 


2p"+1 apt? 
7.1 Ar| <——-____ ee 
Sp OD Sieh eas aS 


We conclude that in (7.11) the error committed 


by neglecting the r—1 roots differen 
from z is less in absolute value than 


(7.18) 2° — Dp, 


re 


i 


¿ 
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The root z is easily calculated from (7.12) by successive approximstions putting 
zo = 1, %41 = f(z). The sequence will converge monotonically to z, and each 
term provides a lower bound for z, whereas any value s such thst s > J(8) provides 
an upper bound. It is easily seen that 


(7.19) z=1 +P + (r +1) +... 
*8. MORE GENERAL PATTERNS 


Our method is applicable to more general problems which have been 
considered as considerably deeper than the theory of runs. 


Examples. (a) Runs of either kind. Let & stand for “either a suc- 
cess run of length r or a failure run of length p.” We are dealing with 
two recurrent events & and &2, where & stands for “success run of 
length 7” and &2 for “failure run of length p” and & means “either &1 
or 8&2.” To & there corresponds the generating function (7.5) which 
will now be denoted by Uj(s). The corresponding generating func- 
tion U2(s) for & is obtained from (7.5) by interchanging p and q and 
replacing r by p. The probability un that & occurs at the nth trial is 
the sum of the corresponding probabilities for & and &, except that 
up = 1. It follows that 


(8.1) U(s) = Uy(s) + Uz(s) — 1. 


The generating function F(s) of the recurrence times of § is again 
F(s) = 1 — U™ (s) or 


(1 — ps)p’s"(1 — os") + (1 — ger — p's") 
l1—s+ gp’st} =e pest} aon gest te 


The mean recurrence time follows by differentiation 


tS = 

(8.3) yp af ae 

gp" + pe — pg? 
As p — ©, this expression tends to the mean recurrence time of success 
runs as given in (7.7). 

(b) In chapter VIII, seetion 1, we calculated the probability x that 
a success run of length r occurs before a failure run of length p. Define 
two recurrent events & and & as in example (a). Let zx, = probability 
that & occurs for the first time at the nth trial and no &» precedes it; 
fn = probability that 6; occurs for the first time at the nth trial (with 
no condition on &2). Define yn and gn as £n and fn, respectively, but 
with 8; and & interchanged. 

The generating function for fn is given in (7.6), and G(s) is obtained 
by interchanging p and q and replacing r by p. For £n and yn we have 


(8.2) F(s) = 
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the obvious recurrence relations 
(8.4) En = fa — (Yifan + Yafa- +... + yn—afi) 

Yn = Gn — (Egni + T2gn—2 +... + n191). 


These equations are of the convolution type, and for the corresponding 
generating functions we have, therefore, 


(8.5) X(s) = F(s) — Y(s)F(s) 
Y(8) = G(s) — X(s)G(s). 
From these two linear equations we get 


= EO - E@)} -SOU - F) 
ee mee “aaa 


Expressions for z, and Yn can again be obtained by the method of 
partial fractions. For s = 1 we get X(1) = Zz, = z, the probability 
of & occurring before &. Both numerator and denominator vanish, 
and X(1) is obtained from L’Hospital’s rule differentiating numerator 
and denominator: X(1) = G1) /{F'(1) + G'(1)}. Using the values 
F'(1) = (1 — p’)/qp" and G1) = (1 — %)/pq? from (7.7), we find 
X(1) as given in equation VIII(1.3) 

(c) Consider the recurrent event defined by the pattern SSFFSS. 
Repeating the argument of section 7, we easily find that 


(8.7) PAP = Uy + 'Q?un—s + PP uns. 


Since we know that Un — u` we get for the mean recurrence time 
wa pig? + pe p. Forp=q= % we find u = 70, whereas 
the mean recurrence time for a success run of length 6 is 126. This 
shows that, contrary to expectation, there ts an essential difference in 
coin tossing between head runs and other patterns of the same length. 


9. LACK OF MEMORY OF GEOMETRIC WAITING TIMES 
The geometric distribution for waiting times has an interesting and 
important property not shared by any other distribution. Consider a 
sequence of Bernoulli trials and let T be the number of trials up to and 
including the first success. Then P{T > k} = g}. Suppose we know 
that no success has occurred during the first m trials ; the waiting time 
T from this mth failure to the first success has exactly the same dis- 
tribution {g*} and is independent of the number of preceding failures, 
In other words, the probability that the waiting time will be prolonged 
by & always equals the initial probability of the total length exceeding 


q 
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k. If the life span of an atom or a piece of equipment has a geometric 
distribution, then no aging takes place; as long as it lives, the atom has 
the same probability of decaying at the next trial. Radioactive atoms 
actually have this property (except that in the case of a continuous 
time the exponential distribution plays the role of the geometric dis- 
tribution). Conversely, if it is known that a phenomenon is charac- 
terized by a complete lack of memory or aging, then the probability 
distribution of the duration must be geometric or exponential. Typical 
is a well-known type of telephone conversation often cited as the model 
of incoherence and depending entirely on momentary impulses; a pos- 
sible termination is an instantaneous chance effect, without relation to 
the past chatter. By contrast, the knowledge that no streetcar has 
passed for five minutes increases our expectation that it will come soon. 
Tn coin tossing, the probability that the cumulative numbers of heads 
and tails will equalize at the second trial is 2- However, given that 
they did not, the probability that they equalize after two additional 
trials is only 4. These are examples for aftereficct, 

For a rigorous formulation of the assertion, suppose that a waiting 
time T assumes the values 0, 1, 2, ... with probabilities po, Pi, po, .... 
Let the distribution of T have the following property: The conditional 
probability that the waiting time terminates at the kth trial, assuming that 
it has not terminated before, equals po (the probability at the first trial). 
We claim that py = (1 — po)*po, so that T has a geometric distribution. 

For a proof we introduce again the “tails” 


Ge = Pry + Pepe + Pixs +...= P{T > k}. 


Our hypothesis is T > k — 1, and its probability is g,_;. The condi- 
tional probability of T = k is therefore p/qk—1, and the assumption 
is that for all k > 1 

PE 


9.1 — =D. 
(9.1) oe 


Now pi = qk—ı — qr, and hence 
(9.2) — =1- p. 


Since go = pı + po +...= 1 — po, it follows that ge = (1 — po) tt, 
and hence py = qr—ı — gk = (1 — po)*po, as asserted. 

In the theory of stochastic processes the described lack of memory 
is connected with the Markovian property; we shall return to it in 
chapter XV, section 10. 
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*10. PROOF OF THEOREM 3 OF SECTION 3 


In section 3 we have omitted the proof of theorem 3. The latter 
can be formulated either as a “Tauberian” theorem on power series 
or in an elementary way as follows. Given a sequence {fn} such that 
fo = 0, fn = 0, Zfn = 1, and that the greatest common divisor of those n 
Jor which fa > O is one. Let up = 1 and define un for n > 1 by 


(10.1) Un = filn—1 + foun +...-+ fatto 


Then un — 1/u, where p = Infa (and un — 0 if Enfa diverges). 
For the proof put 


(10.2) Tr = Santi + fag +... 
so that by formula XI(1.8) 
(10.3) B= Ern. 


From (10.2) we get ro = 1, fy = 17 — Ti, f2 = 11 — T2, ete. Substi- 
tuting these values into (10.1), we find that rou, + Tiun +... + 
F Tno = Toun + Truna +... + Tn—ıuo. If the left side is called 
An, then the right side is A,_1, and our equation states that all An 
are equal. Now Ao = rouo = 1, and hence A, = 1 for all n. Thus 


we have for every n 
(10.4) Toun + TiUn—i +...+ Truo = 1. 


From (10.1) it follows by induction that Un < 1. Hence there exists 
a number A = lim sup un such that for any e > 0 and all sufficiently 
large n we have un < À + «, and there exists some sequence ni, no, 
mg, -.. such that un, — à. Choose an integer j > 0 such that f; > 0. 
We claim that un; — A. If this were not 50, we could find arbitrar- 
ily large subscripts n such that simultaneously 


(10.5) U> A= e UGNA 


Now let N be so large that ry < e. Since us < 1, we have then from 
(10.1) for n > N 


(10.6) Un S foun + fiUn—i +...4+ finn + €. 
For sufficiently large n each u+ on the right side is less than A + €, and 


* This section should be omitted at first reading. 


ja. 
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Un—j <N. Hence 
Un < (fo Hfi +--+ fii +f +... +iato+ 
(10.7) HIN +eS A -fAA+O+fV+e< 
<A + 2c — FFA — 2’). 


If we choose e so small that f;(A — X’) > 3¢, then the last inequality 
contradicts the first one in (10.5), so that the assumption \’ < A is 
impossible. 

This proves that, whenever un, — A, also un,_; > à. Repeating 
the argument, we see: If f; > 0 and Un, — = lim sup up, then also 


Unj > A, Un,—23 — À, Un,—33 — À, etc. 


For simplicity let us first consider the case where fı > 0. Then we 
can take j = 1 and conclude that u,,_. — A for every fixed k. From 
(10.4) we find for n = n, 


(10.8) 1> Toun, + Fii +... TNUn,—N- 


For fixed N every un, — à, so that 1 > A(ro +7 +...+ ry). 
Since N is arbitrary, we conclude that 1 > Au or A < 1/p. This com- 
pletes the proof for the case where (10.3) diverges, for then up — 0. 

If p < %, let y = liminfu,. The same argument shows that, for 
every sequerce n, for which un, — y, also un,» — y. If N is large 
enough that ry < e, then from (10.4) 


(10.9) 1S Toun, +...+ rwUn,—w + 6; 


herein un,- — so that 1 < (ro +...+ ry)y + «and hence ey > 1. 
However, by definition, y < à. Therefore y = \ = 1/y, as was to 
be proved. à 

There remains the case where fı = 0. Consider then the collection 
of all integers j for which f; > 0. Among them we can find a finite 
collection a, b, c, . . ., m whose greatest common divisor is 1. We know 
that, when un, — À, also Un,—za —> À, Un,—yb — A, etc., for every fixed 
z>0,y> 0, ...; w > 0; hence also un, —sa—yb—...—wm — À. Inother 
words, if an integer k is of the form k = za + yb +...+ wm with 
positive integers z, y, ..., w, then un, —+ — à. Now it is known from 
elementary number theory that every integer k exceeding the product 
abc ... m can be written in this form. This means that for k > abe... m 
we have un,—x — à. To get the inequality (10.8) it suffices to apply 
(10.4) ton = n, + ab... m. The remaining part of the proof requires 
no change. 
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11. PROBLEMS FOR SOLUTION 


1. Suppose that F(s) is a polynomial. Prove for this case all theorems of 
section 3, using the partial fraction method of chapter XI, section 4. 

2. Let r coins be tossed repeatedly and let & be the recurrent event that for 
each of the r coins the accumulated number of heads and tails are equal. Is 


3. Let {Xz} be a sequence of mutually independent random variables with 
the common distribution P(X: = a} = b/(a+ b), P(X. = —b} = afla + d), 
where a and b are positive integers. Let & denote the event S, = 0. Prove 
that & is a persistent event. p 

4. Let {Xz} be an arbitrary sequence of mutually independent random 
variables with a common distribution, and let & stand for Sa = 0,8: < 0, 
S250) ...,S,1< 0, Prove that & is a transient recurrent event except 
in the trivial case where P{X, = 0} = 

5. Repeated averaging, Modify the example of section 4 so as to permit 
arbitrary weighted averages and find the limit. 

Note: Problems 6-8 refer to Bernoulli trials with P=¢=} (coin tossing). The 
generating function F(s) = 1 — (1 — 8°) for the return to zero is assumed to be known. 

6. Let & be the recurrent event S, = 0,S,-1 <0. Find the generating 
function F(s) of the recurrence times. 

7. Continuation. Find the generating function of the recurrence time for 
ladder points 


(example 1.4). (Note that this is the same as the waiting time for 
the first passage through 1 discussed in chapter XI, section 3.) 


9. In the counter problem (1.6): (a) Find the generating function of the re- 
currence time. (What is its physical significance?) (b) If Z, is the number of 
registrations in the first n trials, find E(Z,) and Var(Z,). 

10. Counters of Type II differ from those in example (1.b) in that each success 
locks the counter for r time units (r — 1 trials following the Success) so that 
a success during a locked period prolongs that period. Do problem 9 for such 
counters. 

11. Find an approximation to the probability that in 10,000 tossings of a 
coin the number of head runs of length 3 will lie between 700 and 730. 

12. In a sequence of tossings of a coin let & stand for the pattern HTH. 
Let r» be the probability that & does not occur in n trials, Find the generating 

function and use the partial fraction method to obtain an asymptotic ex- 
pansion. Í 

13. In example (8.b) show that the expected duration of the game is 


Hha/(m + ma), 


where j1; and p are the mean recurrence times for success runs of length r and 
failure runs of length p, respectively, 


Ee 


oe sae 
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14. The possible outcomes of each trial are A, B, and C; the corresponding 
probabilities are a, 6, y (æ +8 +y = 1). Find the generating function of 


the probability that in n trials there is no run of length r: (a) of A’s, (6) of 


A’s or B’s, (c) of any kind. 

15. Continuation. Find the probability that the first A-run of length 7 pre- 
cedes the first B-run of length p and terminates at the nth trial. [Note that 
this problem does not reduce to that of example (8.6) with p = a/(a + 8), 
a =b/(& +8). 


Note: The following problems refer to the renewal theory, specifically to example (5.b). 


16. Constancy of the population. For the quantities (5.9) prove by induction 
that >> u(n) = N for every n. 
E 


17. If the mortality distribution is given by Pe = Fp (with p +q = 1), 
find u, and the limiting age distribution, assuming that the original population 
consists of N elements aged zero. š 

18. An age distribution is called stationary if y(n) does not depend on n. 
Show that this is the case if, and only if, v = Cr+, where C is a constant. 


19. Let & be a persistent aperiodic recurrent event. Assume that the re- 
currence time has finite mean p and variance o?, Put Qn = fayi + fapa t+... 
and fn = Qn41+Qn42-+.... Show that the generating functions Q(s) and 
R(s) converge for s = 1. Prove that 


(11.1) bF (un = 3) a= a 
and hence that 

2 2 
(11.2) X (™ —2) =e 


20. Let & be a persistent recurrent event and N, r the number of occurrences 
of & in 7 trials. Prove that E(N,) = u-+...-++ u, and hence 
r 
(11.3) E(N,) oF 
21. Continuation, Prove that 
| sa! 
EN?) = a+... uy + 25 uj(t +... ur) 
ya 


and hence that E(N,°) is the coefficient of s” in 


F?(s) + F(s) 
14) TH FOr 
(Note that this may be reformulated more elegantly using bivariate generating 
functions.) 
22. Let Gn = P(N, = n}. Show that gr,» is the coefficient of s* in 


(11.5) Fr(s) Baro). 
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Deduce that E(N,) and E(N,”) are the coefficients of s* in 


rid LORDE 
(11.6) a= s){1 — F@)} 


and (11.4), respectively. 
23. Using the notations of problem 19, show that 


F(s) = 1 1 R(s) 


UD gan- Fo s mio t al — Fro) 
Hence, using the last problem, conclude that 
1. t= p i 
(118) ae a ae ee 
E Qu! 
with e — 0. 
24. Continuation. Using a similar argument, show that 
= aa t 
(11.9) EN,» = CE BET) OE 4 an 
where a, remains bounded. Hence 
2 
(11.10) Var(N,) ~ Sr. 
p 


25, In a sequence of Bernoulli trials let q,, be the probability that exactly n 
success runs of length 7 occur in k trials. Using problem 22, show that the 
generating function Q(z) = Zq,nz” is the coefficient of s* in 

1 — p's" r 
1 — 8 + gp"s" + — (1 — ps)p"s"z 
Show, furthermore, that the root of the denominator which is smallest in 
absolute value is sı = 1 + gp"(1 — z). 

26. Continuation. The Poisson distribution of long runs.* If the number 
k of trials and the length r of runs both tend to infinity, so that kgp" — », 
then the probability of having exactly n runs of length r tends to e~A"/n!. 

Hint: Using the preceding problem, show that the generating function is 


asymptotically {1 + gp"(1 — x)}-"*~e"-*), Use-the continuity theorem 
of chapter XI, section 6. 


6 The theorem was proved by von Mises, but the present method is considerably 
simpler. 


CHAPTER XIV 


Random Walk and Ruin Problems 


1. GENERAL ORIENTATION 


The first part of this chapter is devoted to Bernoulli trials, and once 
more the picturesque language of betting and random walks is used to 
simplify and enliven the formulations. 

Consider the familiar gambler who wins or loses a dollar with prob- 
abilities p and gq, respectively. Let his initial capital be z and let him 
play against an adversary with initial capital a — z, so that the com- 
bined capital is a. The game continues until the gambler’s capital 
either is reduced to zero or has increased to a, that is, until one of the 
two players is ruined. We are interested in the probability of the 
gambler’s ruin and the probability distribution of the duration of the 
game. This is the classical ruin problem. 

Physical applications and analogies suggest the more flexible inter- 
pretation in terms of the motion of a variable point or “particle” on 
the a-axis. At time 0 this particle is at its initial position z, and at 
times 1, 2, 3, ... it moves a unit step in the positive or negative direc- 
tion, depending on whether the corresponding trial resulted in success 
or failure. The position of the particle at time n represents the gam- 
bler’s capital at the conclusion of the nth trial. The trials terminate 
when the particle for the first time reaches either 0 or a, and we describe 
this by saying that the particle performs a random walk with absorbing 
barriers at O and a. This random walk is restricted to the possible posi- 
tions 1, 2, ..., a—1; in the absence of absorbing barriers the random 
walk is called unrestricted. Physicists use the random-walk model as 
a crude approximation to one-dimensional diffusion or Brownian mo- 
tion, where a physical particle is exposed to a great number of mo- 
lecular collisions which impart to it a random motion. The case p.> q 
corresponds to a drift to the right when shocks from the left are more 
probable; when p = g = 3, the random walk is called symmetric. 

In the limiting case a > © we get a random walk on a semi-infinite 
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line: A particle starting at z > 0 performs a random walk up to the 
moment when it for the first time reaches the origin. In this formula- 
tion we recognize the first-passage time problem; it was solved by ele- 
mentary methods in chapter III (at least for the symmetric case) and 
by the use of generating functions in chapter IX, section 3 (see also 
problem XIII, 7). We shall recognize formulas previously obtained, 
but the present derivation is new. 

In this chapter we shall use the method of difference equations which 
serves as an introduction to the differential equations of diffusion 
theory. This analogy leads in a natural way to various modifications 
and generalizations of the classical ruin problem, a typical and instruc- 
tive example being the replacing of absorbing barriers by reflecting and 
elastic barriers. To describe a reflecting barrier, consider a random 
walk in the interval (0, a) as defined before but with the modification 
that whenever the particle is at point 1 it has probability p of moving 
to position 2 and probability q to stay at 1. In gambling terminology 
this corresponds to a convention that whenever the gambler loses his 
last dollar it is generously replaced by his adversary so that the game 
can continue. The physicist imagines a wall placed at the point 4 of 
the z-axis with the property that a particle moving from 1 toward 0 
is reflected at the wall and returns to 1 instead of reaching 0. Both 
the absorbing and the reflecting barriers are special cases of the so-called 
elastic barrier. We define an elastic barrier at the origin by the rule that 
from position 1 the particle moves with probability p to position 2; with 
probability ôq it stays at 1; and with probability (1 — 8)q it moves to 0 
and is absorbed (i.e., the process terminates). For ô= 0 we have the 
classical ruin problem or absorbing barriers, for 6 = 1 reflecting bar- 
riers. As 6 runs from 0 to 1 we have a family of intermediate cases. 
The greater ô is, the more likely is the process to continue, and with 
two reflecting barriers the process can never terminate. 

Sections 2 and 3 are devoted to an elementary discussion of the 
classical ruin problem and its implications. The next three sections 
are more technical (and may be omitted); in 4 and 5 we derive the 
relevant generating functions and from them explicit expressions for 
the distribution of the duration of the game, etc. Section 6 contains 
an outline of the passage to the limit to the diffusion equation (the 
formal solutions of the latter being the limiting distributions for the 
random walk). 


1 Conversely, some of the new results can be proved also by the method of chap- 
ter III. For the solution of the ruin problem by infinitely many reflections see 
problems 7-9. 


< 
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In section 7 the discussion again turns elementary and is devoted to 
random walks in two or more dimensions where new phenomena are en- 
countered. Section 8 treats a generalization of an entirely different 
type, namely a random walk in one dimension where the particle is 
no longer restricted to move in unit steps but is permitted to change 
its position in jumps which are arbitrary multiples of unity. Such 
generalized random walks have atiracted widespread interest in con- 
nection with Wald’s theory of sequential sampling. 

In conclusion it must be emphasized that each random walk repre- 
sents a special Markov chain, and so the present chapter serves partly 
as an introduction to the next where several random-walk problems 
(e.g., elastic barriers) will be reformulated. 

The problem section contains essential complements to the text and 
outlines of alternative approaches. It is hoped that a comparison of 
the methods used will prove highly instructive. (Readers desiring to 
refer to the graphs and the text of chapter IIT are asked to visualize 
the time axis horizontally and the z-axis in the vertical position.) 


2. THE CLASSICAL RUN PROBLEM 


We shall consider the problem stated at the opening of the present 
chapter. Let g, be the probability of the gambler’s ultimate è ruin 
and p: the probability of his winning. In random-walk terminology 
q: and p: are the probabilities that a particle starting at z will be a5- 
sorbed at 0 and a, respectively. We shall show that p: + qz = 1, so 
that we need not consider the possibility of an unending game. 

After the first trial the gambler’s fortune is either z — 1 or z+ 1, 
and therefore we must have 


(2.1) Q2 = P9241 + 99z—1 


provided 1 <z<a—1. Forz=1 the first trial may lead to ruin, 
and (2.1) is to be replaced by qı = pg2 + 4. Similarly, for z = a — 1 
the first trial may result in victory, and therefore qa—ı = qga—2. To 
unify our equations we define 


(2.2) g=1, ga=0. 


2 Strictly speaking, the probability of ruin is defined in a sample space of infi- 
nitely prolonged games, but we can work with the sample space of n trials. The 
probability of ruin in less than n trials increases with n and has therefore a limit. 
We call this limit “the probability of ruin.” All probabilities in this chapter may 
be interpreted in this way without reference to infinite sample spaces (cf. the 
introduction to chapter VIII). 
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With this convention the probability q+ of ruin satisfies (2.1) for 
z=1,2,...,a—1. 

Equation (2.1) is a difference equation, and (2.2) represents the bound- 
ary conditions on qz. We shall derive an explicit expression for q: by 
the method of particular solutions, which will also be used in more gen- 
eral cases. 

Suppose first that p Æ q. It is easily verified that the difference 
equation (2.1) admits of the two particular solutions g, = 1 and 
qz = (q/p)*. It follows that for arbitrary constants A and B the 
sequence 


(2.3) a:=A+B (2) 


represents a formal solution of (2.1). The boundary conditions (2.2). 


will hold if, and only if, A and B satisfy the two linear equations 
A+B=1and A + B(q/p)*=0. Thus 


en _ (@/p)* — (a/p)? 
j (a/p) — 1 


is a formal solution of the difference equation (2.1), satisfying the 
boundary conditions (2.2). In order to prove that (2.4) is the required 
probability of ruin it remains to show that the solution is unique, that 
is, that all solutions of (2.1) are of the form (2.3). Now, given an 
arbitrary solution of (2.1), the two constants A and B can be chosen 
so that (2.3) will agree with it-for z = 0 and z= 1. From these two 
values all other values can be found by substituting in (2.1) succes- 
sively z = 1, 2,3, .... Therefore two solutions which agree for z = 0 
and z = 1 are identical, and hence every solution is of the form (2.3). 

Our argument breaks down if p = q = 4, for then (2.4) is meaning- 
less because in this case the two formal particular solutions qz = land 
qz = (q/p)* are identical. However, when p = q = % we have a sec- 
ond solution in q; = z, and therefore qz = A + Bzisa solution of (2.1) 
depending on two constants. In order to satisfy the boundary condi- 
tions (2.2) we must put A = 1 and A + Ba = 0. Hence 


2 
(2.5) u=1-. 
(The same numerical value can be obtained formally from (2.4) by 
finding the limit as p — 4, using L’Hospital’s tule.) 
We have thus proved that the required probability of the gambler’s 
ruin is given by (2.4) if p = q, and by (2.5) if p =q = 4. The prob- 
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ability p- of the gambler’s winning the game equals the probability of 
his adversary’s ruin and is therefore obtained from our formulas on 
replacing p, g, and z by q, p, and a — z, respectively. It is readily 
seen that pz + qz = 1, as stated previously. 

We can reformulate our result as follows: Let a gambler with an 
initial capital z play against an infinitely rich adversary who is always 
willing to play, although the gambler has the privilege of stopping at his: 
pleasure. The gambler adopts the strategy of playing until he either loses 
his capital or increases it to a (with a net gain a — z). Then qz is the 
probability of his losing and 1 — q= the probability of his winning. 

Under this system the gambler’s ultimate gain or loss is a random 
variable G which assumes the values a — z and —2 with probabilities 
1 — q: and qz, respectively. The expected gain is 


(2.6) E(G) = a(l — q:) — 2. 


Clearly E(G) = 0 if, and only if, p = gq. This means that, with the 
system described, a “fair” game remains fair, and no “unfair” game 
can be changed into a “fair” one. 

From (2.5) we see that in the case p = g& player with initial capital 
z = 999 has a probability ~%%.to win a dollar before losing his capital. 
With q = 0.6, p = 0.4 the game is unfavorable indeed, but still the 
probability (2.4) of winning a dollar before losing the capital is about $. 
In general, a gambler with a relatively large initial capital z has a rea- 
sonable chance to win a small amount a — z before being ruined.* 

Let us now investigate the effect of changing stakes. Changing the 
unit from a dollar to a half-dollar is equivalent to doubling the initial 
capitals. The corresponding probability of ruin g.* is obtained from 
(2.4) on replacing z by 2z and a by 2a: 


(2.7) « _ ly — (alp)** _ aie + (a/p). 
i u= a O +1 


For q > p the last fraction is greater than unity and g.* > qe. We 
restate this conclusion as follows: If the stakes are doubled while the 
initial capitals remain unchanged, the probability of ruin decreases for 


3 A certain man used to visit Monte Carlo year after year and was always suc- 
cessful in recovering the costs of his vacations. He firmly believed in a magic 
power over chance. Actually his.experience is not surprising. Assuming that he 
started with ten times the ultimate gain, the chances of success in any year are 
nearly yy. The probability of an unbroken sequence of ten successes is about 
(1 — py)” = e! ~ 0.37. Thus continued success is by no means improbable. 
Moreover, one failure would, of course, be blamed on an oversight or momentary 
indisposition. 
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the player whose probability of success is Pp < } and increases for the 
adversary (for whom the game is advantageous). Suppose, for example, 
that Peter owns 90 dollars and Paul 10, and let p = 0.45, the game 
being unfavorable to Peter. If at each trial the stake is one dollar, 
table 1 shows the probability of Peter’s ruin to be 0.866, approximately. 


TABLE 1 


ILLUSTRATING THE CLASSICAL Rom PRroBLEM 


Probability of Expected 
Pp q 2 a ST a 

Ruin Success Gain Duration 

0.5 0.5 9 10; 0.1 0.9 0 9 

45) 5 90 100 1 9 0 900 
6.5 900 1,000 Sil 9 0 90,000 
æ 45 950 1,000 05 95 0 47,500 
-5  .5 | 8,000 10,000 2 8 0 16,000,000 
.45 .55 9 10 -210 -790 =1.1 11 
.45 .55 90 100 -866 -134 —76.6 765.6 
.45 155 99 100 +182 -818 cal Y 171.8 
aa) 36 90 100 +983 -017 —88.3 441.3 
4 6 99 100 +333 -667 —32.3 161.7 


The initial capital is z. The game terminates with ruin (loss z) or capital a 
(gain a — 2). 


— 


probability of ruin from (2.4), replacing z 
probability of ruin decreases as k increases. 


mean the end of all insurance business, for the careful driver who in- 
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sures against liability obviously plays a game that is technically ‘‘un- 
fair.” Actually, there exists no theorem in probability to discourage 
such a driver from taking insurance. z 


3. EXPECTED DURATION OF THE GAME 


The probability distribution of the duration of the game will be de- 
duced in the following sections. However, its expected value can be 
derived by a much simpler method which is of such wide applicability 
that it will now be explained at the cost of a slight duplication. 

We are still concerned with the classical ruin problem formulated at 
the beginning of this chapter. We shall assume as known the fact that 
the duration of the game has a finite expectation D,. A rigorous proof 
will be given in the next section. 

If the first trial results in success the game continues as if the initial 
position had been z + 1. The conditional expectation of the duration 
assuming success at the first trial is therefore D.4; + 1. This argu- 
ment shows that the expected duration D. satisfies the difference equa- 
tion 


(3.1) D, = pDesi + D241 +1, O<z<a 
with the boundary conditions 
(3.2) D=0, D=0. 


The appearance of the term 1 makes the difference equation (3.1) 
non-homogeneous. If p ¥ g, then D, = z/(g — p) is a formal solution 
of (3.1). The difference A, of any two solutions of (3.1) satisfies the 
homogeneous equations A, = pAz4; + qgA:_1, and we know already 
that all solutions of this equation are of the form A + B(q/p)?. It 
follows that when p # g all solutions of (3.1) are of the form 


(3.3) D= +a +B (ÈN. 
a-p p 


The boundary conditions (3.2) require that A + B = 0 and A + 
+ B(g/p)* = —a/(q — p). Solving for A and B, we find 

z a 1- z 
on gate, Oe a T 

G2 Gop L—(C/p) 

Again the method breaks down if g = p = 4. In this case we must 
replace z/(q — p) by —2’, which is now a solution of (3.1). It follows 
that when p = q = all solutions of (3.1) are of the form D, = —2? + 
+ A + Bz. The required solution D, satisfying the boundary condi- 
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tions (3.2) is 
(3.5) D, = z(a — 2). 


The expected duration of the game in the classical ruin problem is given 
by (8.4) or (3.5), according asp Æ q or p = q = 3. 

It should be noted that this duration is considerably longer than we 
would naively expect. If two players with 500 dollars each toss a coin 
until one is ruined, the average duration of the game is 250,000 trials. 
If a gambler has only one dollar and his adversary 1000, the average 
duration is 1000 trials. Further examples are found in table 1. 


Passage to the limit a — œ. If in the formulas (2.4) and (2.5) for the prob- 
ability of ultimate ruin we let @ — ©, we find that 


A soils i whe 

Q: “> ay it a<p 

Nobody would hesitate to interpret these limits as probabilities of ruin in a game 
against an infinitely rich adversary, but axiomatically a random walk on the semi- 
infinite interval (0, ©) should be considered on its own merits. Now saying that in 
such a random walk a particle starting at z > 0 reaches the origin is really the 
same as saying that in an unrestricted random walk a particle reaches a position 
z units to the left from its starting point. This probability has been calculated in 
chapter XI, section 3, and agrees with (3.6): In a game against an infinitely rich 
adversary the probability of ruin is one if q > p and (q/p)? if q <p. In the sec- 
ond case there is no sense in talking about the expected duration of the game since 
the game may go on forever. When q > p we get for the expected duration of 
the game the limit z(q — p)', and if q = p the limit is infinite. This agrees with 
our knowledge that in a symmetric random walk all first-passage times have an 


infinite expectation. (An independent derivation of these results is contained in 
the next section.) 


*4. GENERATING FUNCTIONS FOR THE DURATION OF 
THE GAME AND FOR THE FIRST-PASSAGE TIMES 


We shall use the method of generating functions to study the dura- 
tion of the game in the classical ruin problem, that is, the restricted 
random walk with absorbing barriers at 0 and a. The initial position 
isz (withO < z < a). Let u.,, denote the probability that the process 
ends with the nth step at the barrier 0 (gambler’s ruin at the nth 
trial). After the first step the position is z + 1 or z — 1, and we con- 
clude that for 1 < z < a — l andn > 1 


(4.1) Us nti = Puzy,n SP QUz—i,n- 
This is a difference equation analogous to (2.1), but depending on the 


* This section together with the related section 5 may be omitted at first reading. 
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two variables z and n. In analogy with the procedure of section 2 we 
wish to define boundary values wo,n, Ua,n, and uzo so that (4.1) becomes 
valid also for z = 1, z = a — 1, and n = 0. For this purpose we put 


(4.2) Uon = Uan = 0 when n>1 
and 
(4.3) uo,o = 1, uzo = 0 when z>0. 


Then (4.1) holds for all z with O < z < a and all n > 0. 
We now introduce the generating function 


wo 
(4.4) U.(s) = È tans” 

n=0 
Multiplying (4.1) by s**? and adding for n = 0, 1, 2, ..., we find 
(4.5) U.(s) = psUz41(s) + gsUz-1(8), O<z<a 
and equations (4.2) and (4.3) lead to the boundary conditions 
(4.6) Uo(s)=1, Uels) = 0. 


Equation (4.5) is a difference equation analogous to (2.1), and the 
boundary conditions (4.6) correspond to (2.2). The novelty lies in the 
circumstance that the coefficients and the unknown U.(s) now depend 
on the variable s, but as far as the difference equation is concerned, s 
is merely an arbitrary constant. We can again apply the method of 
section 2 provided we succeed in finding two particular solutions of 
(4.5). It is natural to inquire whether there exist two solutions U,(s) 
of the form U,(s) = A*(s). Substituting this expression into (4.5), we 
find that \(s) must satisfy the quadratic equation 


(4.7) A(s) = psd?(s) + qs, 
which has the two roots 
-1+ (1 — 4pqs?)# 1 — (1 — 4pqs?)# 
CONO <9 an Ee 
2ps 2ps 


(we take 0 < s < 1 and the positive square root). 
We have thus found two particular solutions of (4.5) and conclude 
as in section 2 that for arbitrary functions A (s) and B(s) 


(4.9) U.(s) = A(s)d1*(s) + B(s)do*(s) 
is a solution of (4.5). To satisfy the boundary conditions (4.6), we 
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must have A(s) + B(s) = 1 and A(s)d;°(s) + B(s)A2"(s) = 0, or 
= 1°(s)d2*(s) — Ar*(s)A2"(s) _ 

A1°(s) — Az%(s) 


Using the obvious relation ;(s)A2(s) = g/p, the last formula simplifies 
to 


(4.10) Us) = 


(4.11) TO= Se. 


p/ d1°(s) — d2°(s) 


This is the required generating function of the probability of ruin at the 
nth trial (absorption at 0). The corresponding generating function for 
the probability of absorption at a is obtained on replacing p, q, z by 
q, pP, and a — z, respectively. The generating function of the duration 
of game is, of course, the sum of the two generating functions. 


The Case a = © 


Our method applies equally to the case a = œ which corresponds to 
a random walk on (0, œ) with an absorbing barrier at the origin (or 
playing against an infinitely rich adversary). We have now the sole 
boundary condition Up(s) = 1. All solutions of (4.5) are of the form 
(4.9), but since \;(s) > 1 and A2(s) < 1 for 0 < s < 1, we find that 
U,(s) is unbounded unless A(s) = 0. Hence the required solution is 


(4.12) V(s) = r2*(s). 


This is the generating function of the probability that, starting from a 
point z > 0, the particle will be absorbed at the origin exactly at the nth 
trial. 

In other words, in an unrestricted random walk (4.12) is the generating 
function of the distribution of first-passage times through a point z units 
to the left from the initial position. To. get a formula for the first pas- 
sages to the right we have only to interchange p and q. A glance at 
(4.8) will show that in an unrestricted random walk starting from the 
origin. the first-passage times through a point z > 0 have the generating 
functions 


(4.13) a= @ w) =A"). 


For the particular value z = 1 we find à(s) as the generating func- 
tion for the first passages one unit to the right. The first passage from 
0 through an arbitrary z > 1 is the sum of the first-passage times from 
0 to 1, from 1 to 2, ..., from z — 1 to z, and is therefore the sum of z 
independent random variables each having the generating function 
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A(s). This explains why in (4.13) we find the zth power of a generating 
function. 

Substituting s = 1 into (4.13), we find the probability of ruin in the 
case of an infinitely rich adversary. It is (@/p)* or 1, according as 
gSporg>p. 


*5. EXPLICIT EXPRESSIONS 


We shall now derive an explicit formula for Uz,n by expanding U,(s) 
into partial fractions. Formally, the expression (4.11) for U,(s) de- 
pends on a square root, but in reality U,(s) is a rational function. In 
fact, expanding the expressions (4.8) according to the binomial theorem, 
we see that the difference A,*(s) — Ag*(s) is a rational function in s 
multiplied by (1 — 4pqs”)!; this root appears as a factor in both the 
numerator and the denominator of (4.11), and hence U. z(s) is the ratio 
of two polynomials. The degree of the denominator is a — 1 for a odd 
and a — 2 for a even; the degree of the numerator is a — 1 when 
a — zis odd and a — 2 when a — z is éven. Inno case can the degree 
of the numerator exceed the degree of the denominator by more than 
one. Hence for n > 1 we can compute uzn from equation XI(4.8), 
provided all the roots of the denominator are distinct. 

We could calculate the roots of the denominator and the correspond- 
ing coefficients p, directly, but the algebra simplifies if we introduce a 
new independent variable ¢ by 


(5.1) 
From (4.8) we find 


= 2(pq)is. 
ak (pq)3s. 


(5.2) O (2) coss + isin ¢) = (Ses, 


and hence from (4.11) 
@\** sin (a — z)¢ 
ae oral 6) acer gs 


The roots of the denominator are obviously ¢ = 0, x/a, Qn/a, .... 
The corresponding values of s are 


1 
6.4) = 2(pq)* cos v/a 
We get all possible values for s,, putting v = 0, 1, .. -, @. However, 


to v = 0 and v = a there correspond the extraneous values ¢=0, 2, 
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which are also roots of the numerator in (5.3), and if a is even, no 
number s, corresponds to v = 3a. Hence, when a is odd, we get all 
a — 1 roots s, putting v = 1, 2, ..., a—1; when a is even, the value 
v = 4a must be omitted. We should disregard those s, which are also 
roots of the numerator, but for them (5.6) leads automatically to 


py = 0. 
We know that 
sin (a — z eas 
(5.5) (2) WERI 4 bh pO. b Oo 
p. sin aġ &-—- s sa] =S 


To find p, multiply both sides by s, — s and let s — s,. We get (put- 
ting ¢, = 7v/a) as in formula XI(4.5) 


p/ a-cos vr: (db/ds) sms 


k o sin zrv/a-sin my/a 


p/ 2a(lpq)? cos? v/a 


Hence we get finally from (5.5) for the coefficient uz,n of s” when 
n>1 


a—l 
& P FY ITV TZV 
(5.7) Wan = a2 pa—an) S cos?! — - sin — - sin —- 
vol a a a 


(Strictly speaking, the term » = 4a should be omitted when a is even 
but it is zero anyway and therefore does no harm.) 

Forn > 1 formula (5.7) represents the probability of ruin (absorption) 
at the nth trial. It goes back to Lagrange and has been derived in many 
different ways.‘ Despite an honorable history and its availability in 
textbooks, the formula is rediscovered at frequent intervals. For an 
alternative explicit expression see problem 13; for limiting forms see 
section 6 and problem 14 (analogous formulas for reflecting barriers 
are derived in chapter XVI, section 3). 

If we let a — œ, the sum in (5.7) may be interpreted as a Riemann 
sum approximating an integral. In this way we find that in a game 
against an infinitely rich adversary (single absorbing barrier at 0) the 
probability wz. that a player with initial capital z > 0 will be ruined 


1 An elementary derivation using trigonometric interpolation was given by Ellis, 
Cambridge Mathematical Journal, vol. 4 (1844), or The Mathematical and Other 
Writings of R. E. Ellis, Cambridge and London, 1863. 
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exactly at the nth step is 


1 
(5.8) Wan = Ppa) f cos"! rz sin rz sin rzz-dz. 
o 
This integral can be expressed in an elementary way ê as follows 


a a e 
(5.9) Wan = — 2 ) Baer; 
n \3(n — 2) 


where the binomial coefficient is to be interpreted as zero if 3(n — 2) 
is not an integer of the interval (0, n]. The corresponding generating 
function was found to be A2*(s) (see end of section 4). 


6. PASSAGE TO THE LIMIT; DIFFUSION PROCESSES 


It has already been pointed out that our random-walk models serve 
as a first approximation to the theory of diffusion and Brownian motion, 
where small particles are exposed to a tremendous number of molecular 
shocks. Each shock has a negligible effect, but the superposition of 
many small actions produces an observable motion. Accordingly, we 
now want to study random walks where the individual steps are ex- 
tremely small and occur in very rapid succession. In the limit the 
process will appear as a continuous motion. The point of interest is 
that in passing to this limit our formulas remain meaningful and agree 
with physically significant formulas of diffusion theory which can be 
derived under much more general conditions by more streamlined 
methods. This explains partly why the random-walk model, despite 
its crudeness, describes diffusion processes reasonably well; only the 
limiting case is physically significant, and various discrete models lead 
to the same limiting formulas. The situation is in many ways analogous 
to the conditions of the central limit theorem where the cumulative 

‘For p = q =} formula (5.9) reduces to the formula TII(4.11) for the first- 
passage time distribution. It is by no means easy to verify that (5.8) and (5.9) 
agree. Perhaps the simplest way is to show that both formulas represent solutions 
of the difference equation (4.1) with the boundary conditions (4.2)-(4.3) at the 
origin. 

The limiting formulas of the present section agree with those of the now classical 
Einstein-Wiener theory. The newer, more refined theories (Uhlenbeck, Ornstein) 
are not considered here. Credit for discovering the connection between random 
walks and diffusion is due principally to L. Bachelier (1870- ). His work is fre- 
quently of a heuristic nature, but he derived many new results. Kolmogorov’s 
theory of stochastic processes of the Markov type is based largely on Bachelier’s 
ideas. See in particular L. Bachelier, Calcul des probabilités, Paris, 1912. 
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effect of many chance components is practically independent of the 
nature of the individual components. 

Let us begin with an unrestricted random walk starting at the origin, 
and let vz,n be the probability that the nth step takes the particle to the 
position x. If r among the n steps are directed to the right, n — r are 
directed to the left, and the total displacement isr — (n — r) = 2r — n 
units. This displacement can equal z only if n and z are either both 
even or both odd (which means that after an even number of steps the 


n 
abscissa x is an even integer). Out of n steps r can be selected in ( ) 
7 


ways, and therefore 


= n (n+z) (n2). 
(6.1) Vain ods +4 a p @ ; 
again the binomial coefficient should be interpreted as 0 whenever 
3(n + 2) is not an integer in the interval (0, n]. 

An alternative way of deriving (6.1) uses the argument which led to 
the difference equation (4.1) and the boundary conditions (4.2) and 
(4.3). It can be verified that vz,n must satisfy the difference equation 


(6.2) Vzn41 = PUz—1yn + Mz+in 


with the boundary conditions 
(6.3) vo, = 1, tz,9 = 0 for z0. 


Given (6.3), we put in (6.2) successively n = 1, 2, ... and get first all 
values vz,;, and then successively vz,9, vz,3, -... This shows that the 
conditions (6.2) and (6.3) uniquely determine vz,n. On the other hand, 
it is readily seen that (6.1) is a solution. 

Let us now change the unit of length so that each step has length Ax 
and suppose that the time between any two consecutive steps is At. During 
time ¢ the particle performs about ¢/At jumps, and a displacement x is 
now equivalent to z/Az units. Only multiples of Az and At represent 
meaningful coordinates, but in the limit Az — 0, At — 0 every dis- 
placement and all times become possible. 

We must not expect sensible results if Az and At approach zero in an 
arbitrary manner, for the maximum possible displacement in time ¢ 
amounts to tAz/At, so that in the limit no motion exists if Az/At — 0. 
Physically speaking, we must keep the z- and ¢-scales in an appropriate 
ratio or the process will degenerate in the limit, the variances tending 
to zero or infinity. To find the proper ratio note that the total dis- 
placement during time ¢ is the sum of about ¢/At mutually independent 

random variables each having the mean (p — g)Az and variance 
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{1 — (p — q)?}(Az)? = 4pq(Az)?. The mean and variance of the total 
displacement in time ¢ are therefore about t(p — g)Ax/At and 
4pqt(Az)*/At, respectively. To obtain reasonable results we must let 
Az and At approach zero in such a way that they remain finite for all ¢. 
The finiteness of the variance requires that (Az)?/At should remain 
bounded; the finiteness of the mean implies that p — g must be of the 
order of magnitude of At. This suggests putting 


(en EZ Won dS A Soa 
ee C2... ij ie, Yor oe 
At - BS gion a3 Sap ae 


where D and c are constants. The value of D introduces only a scale 
factor; for mathematical simplicity it is best to put D = 1, but we keep 
D unspecified to facilitate comparison with physical theories. The 
constants D and c are, respectively, the diffusion coefficient and the 
drift. Ifc = 0, the random walk is symmetric; in general, the sign of 
c determines the direction of the drift. In the limit p and g approach 
4; with any other norming the particle would drift away so fast that 
the probability of finite displacements would tend to zero, 

We use the norming (6.4) to pass to the limit Ar => 0, At > 0. 
The total displacement at time ¢ ~ nAt is determined by n Bernoulli 
trials, and therefore the limiting form of vz, is given by the normal 
distribution. For a fixed Ax the displacement is the sum of finitely 
Many independent variables, and its mean is i(p — g)Ax/At = 2ct; its 
variance 4pqt(Ax)?/At = 2Dt. Therefore the probability that at time t 
the displacement lies between To and x (To < z1) tends to- 


a 2 
(6.5) (2) = f eo ay 
vo 
where y; = (xı — 2ct)/(2D1)! and Yo = (zo — 2Wct)/(2Di). 

As for equation (6.2), we pass to the usual functional notation and 
write it in the form v(z, t+At) = p-v(z—Az, t) + q-v(z-+ Az, t). Ex- 
panding according to Taylor’s theorem up to terms of second order, 
we get formally 


(6.6) At- (q — p)Az 


avl, t) _ du(z, i) F (Az)? v(x, t) 
apor oz 


2 ax? 
Using (6.4), we get in the limit 


ao(s, t) <3 dv(z, t) r ICA D 


(62) ði ox ox? 
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This is the Fokker-Planck equation for diffusion with drift, which can 
be derived from more general and more convincing assumptions. In 
the usual theory, the solution (6.5) is derived from (6.7), but we have 
obtained both results by the same limiting process. Our procedure is 
only heuristic but can be justified rigorously. All formulas of the dis- 
crete random walk permit a similar passage to the limit. 

As a further example, consider the limiting form of the probabilities 
for the first passage. For simplicity let us first consider formula (5.9) 
which corresponds to a single barrier. Of the two quantities wz,n 
and W2,n41, one is necessarily zero. The sum wz,, + Wz,n41 represents, 
asymptotically, the probability of absorption during the time interval 
(t,t+2At). We shall show that win + win41 ~ f(z, t)(2At), where 
f(z, t) is a continuous function. Then the limiting probability of ab- 
sorption within any time interval (t,, t2} is the integral of f(z, t) ex- 
tended over that interval When n — z is even, we have Wz,n41 = 0, 
and to find f(z, t) we must replace z in (5.9) by z/Az and n by ¢/At, 
and apply (6.4). Using the normal approximation to the binomial dis- 
tribution and the last equation (6.9), we find easily 7 


z 
2(r DÈ) 
This is the limiting form of (5.9); again it coincides with the corre- 
sponding formula of diffusion theory. In fact, it is easily verified that 
f(—z, t) is a solution of (6.7). (In the definition of w.,, the variable 
z plays the role of —z in vz,n.) 
A similar argument applies to (5.7). An inspection of this formula 
shows that the contributions of v = k and v = a — k cancel if n — z 
is odd and add if n — z is even. Hence we get the limiting form of 
S(@,t) ~ (Uz,n + Uzn4i)/(2At) by extending in (5.7) the sum twice 


over 1 < v < a/2. Replacing z, a, n respectively by z/Az, a/Ax, t/At 
and observing that for fixed v 


(6.8) fz) ~ 


e™i(z+2ct)?/Dt 


. TvÂt = mvAr 
sin ~— 
a a 
mvAz\t/4t Dr’ At\7/4t 
(6.9) cose ~(1- eoe Panua: 
a a? 


q z|2Az 
(4p)! (5) = @ cltrs)ID, 


; 7 In the symmetric case c = 0 (i.e., p = q), formula (6.8) agrees with the limiting 
distribution for first-passage times derived by elementary methods in III(8.c). 
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we obtain formally the limiting form - 


Tey 


(6.10) f(z, ) ~ 2rDa~Pe"t+2)1D F pe Dte gi, 

»=1 a 
The formal passage to the limit is justified because of uniform con- 
vergence: the contribution of the terms with large v is negligible both 
in (6.10) and in the original sum (5.7) (where we have v < a/2). 

In diffusion theory (6.10) is known as Fiirth’s formula for first 
passages and is derived directly from the Fokker-Planck equation. In 
free diffusion the integral over (6.10), extended over the time interval 
(tı, t2), gives the probability that a particle starting at z > 0 will within 
that time interval for the first time reach the origin without having 
previously passed the barrier a. 


*7. RANDOM WALKS IN THE PLANE AND SPACE 


In a two-dimensional random waik the particle moves in unit steps 
in one of the four directions parallel to the z- and y-axes. For a par- 
ticle starting at the origin the possible positions are all points of the 
plane with integral-valued coordinates. Each position has four neigh- 
bors. Similarly, in three dimensions each position has six neighbors. In 
order to define the random walk the corresponding four or six prob- 
abilities must be specified. For simplicity we shall consider only the 
symmeiric case where all directions have the same probability. The 
complexity of problems is considerably greater than in one dimension, 
for now the domains to which the particle is restricted may have arbi- 
trary shapes so that complicated boundaries take the place of the 
single-point barriers in the one-dimensional case. 

We begin with an interesting theorem due to Polya.’ 


Theorem. In the symmetric random walks in one and two dimensions 
there is probability one that the particle will sooner or later (and therefore 
infinitely often) return to its initial position. In three dimensions, how- 
ever, this probability is only about 0.35 (the expected number of returns 
is then 0.6524(0.35)* = 0.35/0.65 ~ 0.53). 


Before proving the theorem let us give two alternative formulations, 
both due to Polya. First, it is almost obvious that the theorem implies 


* This section trex’s a special topic and may be omitted at first reading. 

8G. Polya, Uber eine Aufgabe der Wahrscheinlichkeitsrechnung betreffend die 
Irrfahrt im Strassennetz, Mathematische Annalen, vol. 84 (1921), pp. 149-160. The 
numerical value 0.35 was calculated by W. H. McCrea and F. J. W. Whipple, 
Random paths in two and three dimensions, Proceedings of the Royal Society of 
Edinburgh, vol. 60 (1940), pp. 281-298. 
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that in one and two dimensions there is probability 1 that the particle will 
pass infinitely often through every possible point; in three dimensions this 
is not true, however. Thus the statement “‘all roads lead to Rome” is, 
in a way, justified in two dimensions. 

Alternatively, consider two particles performing independent sym- 
metric random walks, the steps occurring simultaneously, Will they 
ever meet? To simplify language let us define the distance of two 
possible positions as the smallest number of steps leading from one 
position to the other. (Then distance = sum of absolute differences of 
the coordinates). If the two particles move one step each, their mutual 
distance either remains the same or changes by two units, and so their 
distance either is even at all times or else is always odd. In the second 
case the two particles can never occupy the same position. In the 
first case it is readily seen that the probability of their meeting at the 
nth step equals the probability of the first particle’s reaching in 2n 
steps the initial position of the second particle. Hence our theorem 
states that in two, but not in three, dimensions the two particles are 
sure infinitely often to occupy the same position. If the initial dis- 
tance of the two particles is odd, a similar argument shows that they 
will infinitely often occupy neighboring positions. If this is called 
meeting, then our theorem asserts that in one and two dimensions the 
two particles are certain to meet infinitely often, but in three dimensions 
there is a positive probability that they never meet. 


Proof. For one dimension the theorem has been proved in example 
XIII(3.b), except that there we referred to a coin-tossing game rather 
than to a symmetric random walk. The proof for two and three dimen- 
sions proceeds along the same lines. Let un be the probability that 
the nth trial takes the particle to the initial position. According to 
theorem 2 of chapter XIII, section 3, we have to prove that in the case 
of two dimensions Zun diverges, whereas in the case of three dimensions 
Zun =~ 0.53. In two dimensions a return to the initial position is pos- 
sible only if the numbers of steps in the positive z- and y-directions 
equal those in the negative z- and y-directions, respectively. Hence 
un = 0 if n is odd and (using the multinomial distribution V1(9.2) 


iP Be (2n)! 1 /2n\ 2 /n\? 
A) stan = ~ Ln Hn- k! 4 Ce » A 


k=0 


Qn\? 
The last expression equals gaf ) , by formula II(12.11). Stir- 
n 


ling’s formula shows that usn is of the order of magnitude 1/n, so that 
Lug, diverges as asserted. 


ee 
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In the case of three dimensions we find similarly 


1 (2n)! 
6?" ze Jjlklki(n— j — kn —j — BR! 


(7.2) tan = 


the summation extending over all j,k with j+ k <n. It is easily 
verified that 


1 /2n 1 n! V2 
(a mecla) E e eT j 


Within the braces we have the terms of a trinomial distribution, and 
we know that they add to unity. Hence the sum of the squares is 
smaller than the maximum term within braces, and the latter is attained 
when both j and k are close to n/3. Stirling’s formula shows that this 
maximum is of the order of magnitude n™*, and therefore wen is of the 
magnitude n™? so that uzn converges as asserted. 

Polya’s theorem is analogous to the facts concerning multiple coin 
tossings discussed in example XIII(@.c). 

We conclude this section with another problem which generalizes 
the concept of absorbing barriers. Consider the case of two dimensions 
where instead of the interval 0 < x < a we have a plane domain D, 
that is, a collection of points with integral-valued coordinates. Each 
point has four neighbors, but for some points of D one or more of the 
neighbors lie outside D. Such points form the boundary of D, and all 
other points are called interior points. In the one-dimensional case 
the two barriers form the boundary, and our problem consisted in find- 
ing the probability that, starting from z, the particle will reach the 
boundary point 0 before reaching a. By analogy, we now ask for the 
probability that the particle will reach a certain section of the boundary 
before reaching any boundary point that is not in this section. This 
means that we divide all boundary points into two sets B’ and B”. If 
(x,y) is an interior point, we ask for the probability u(r, y) that, 
starting from (z, y), the particle will reach a point of B’ before reaching 
a point of B”. In particular, if B’ consists of a single point, then 
u(z, y) is the probability that the particle will, sooner or later, be ab- 
sorbed at that particular point. 

Let (z, y) be an interior point. The first step takes the particle 
from (z, y) to one of the four neighbors (x1, y), (x, y1), and if all 
four of them are interior points, we must have 


(7.4) ulz, y) =Fue+1y+u@-Lyt 
+u(z,y +1) + ul y -= 1). 


330 RANDOM WALK [XIV.7 


This is a partial difference equation which takes the place of (2. 1) (with 
p=q=}). If @+1,y) is a boundary point, then its contribution 
u(z-++1, y) must be replaced by 1 or 0, according to whether (z-++1, y) 
belongs to B’ or B”. Hence (7.4) will be valid for all interior points if 
we agree that for a boundary point (Ẹ, n) we put u(E, n) = 1 if (E, n) is in 
B' and ul¢, n) = 0 if (E, n) is in B”. This convention takes the place of 
the boundary conditions (2.2). 

In (7.4) we have a system of linear equations for the unknowns 
u(z, y); to each interior point there correspond one unknown and one 
equation. The system is non-homogeneous, since in it there appears at 
least one boundary point (é, 7) of B’ and it gives rise to a contribution 4 
on the right side. - If the domain D is finite, there are as many equations 
as unknowns, and it is well known that the system has a unique solu- 
tion if, and only if, the corresponding homogeneous system (with 
ul, n) = 0 for all boundary points) has no non-vanishing solution. 
Now u(z, y) is the mean of the four neighboring values u(x1, y), 
u(z, y+1) and cannot exceed all four. In other words, u(x, y) has 
neither a maximum nor a minimum in the strict sense, and the greatest 
and the smallest value occur at boundary points. Hence, if all bound- 
ary values vanish, so does u(z, y) at all interior points, which proves 
the existence and uniqueness of the solution of (7.4). Since the bound- 
ary values are 0 and 1, all values u(z, y) lie between 0 and 1, as is re- 
quired for probabilities. These statements are true also for the case 


of infinite domains, as will be seen from a general theorem on infinite 
Markov chains? 


8. THE GENERALIZED ONE-DIMENSIONAL RANDOM WALK 
(SEQUENTIAL SAMPLING) 


We now return to one dimension but abandon the restriction that 
the particle moves in unit steps. Instead, at each step the particle shall 
have probability p; to move from any point x to x + k, where the integer 
k may be zero, positive, or negative. We shall investigate the following 
ruin problem: The particle starts from a position z such that 0 < z < a; 
we seek the probability u, that the particle will arrive at some position 
<0 before reaching any position > a. In other words, the position 
of the particle at time n is the point z + Xi + X; +... + X, of the 
x-axis, where the {X;} are mutually independent random variables 


with the common distribution {p,}; the process stops when for the first 
time either Xj +... Xa < 0 or X; +... X, >a —z. 


° Explicit solutions are known in only a few a i 
cated. Solutions for the case of recta, cases and are always very comp! 


ngular domains, infini i „ will be 
found in the paper by McCrea and Whipple cited hice ete., 


a 
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This problem has attracted widespread interest in connection with 
sequential sampling. There the X, represent certain characteristics of 
samples or observations. Measurements are taken until a sum X, + 
+...+ X; falls outside two preassigned limits (our —z and a — z). 
In the first case the procedure leads to what is technically known as 
rejection, in the second case to acceptance. The first sampling pro- 
cedure of this kind was described by W. Bartky; © the general theory 
was outlined by A. Wald, to whom the formulation above is due." 

Without loss of generality we shall suppose that steps are possible 
in both the positive and negative directions. Otherwise we would 
have either u: = 0 or uz = 1 for all z. 

The probability of ruin at the first step is obviously 


(8.1) Ts = Pz + Pri a pus-2+... 


(a quantity which may be zero). The random walk continues only if 
the particle moved to a position x with 0 < x < a; the probability of 
a jump from z to x is Pz—z; and the probability of subsequent ruin is 
then uz. Therefore 

a—l 
18.2) Us = Do UsPa—s F Te 

z=æl 

Once more we have here a — 1 linear equations for a — 1 unknowns 

u;. The system is non-homogeneous, since at least for z = 1 the 
probability rı is different from zero (steps in the negative direction 
being possible, which obviously implies rı > 0). We claim that the 
corresponding homogeneous system 


a—l 
(8.3) üs = Dy teraz 


z=! 
has no solution except 0. 

In fact, if it had another solution, one of the values wu, would be 
largest in absolute value, say uz = M > 0. Suppose first that p—ı # 0. 
“ince the coefficients pz—-: in (8.3) add to at most unity, the equation 
is possible only if all those pz—z which actually appear on the right 
side (with a coefficient different from zero) equal M, and if their 
coefficients add to 1. Hence uz—ı = M, and, arguing the same way, 


10 W, Bartky, Multiple sampling with constant probability, Annals of Mathe- 
matical Statistics, vol. 14 (1943), pp. 363-377. It is described in example XV (2.3). 
u A, Wald, On cumulative sums of random variables, Annals of Mathematical 
Statistics, vol. 15 (1944), pp- 283-296. The methods described in the present book 
are different from Wald’s. See also Wald’s book, Sequential analysis, John Wiley 


& Sons, New York, 1947. 
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Uz_9 = Ur_g =... =U, = M. However, for z = 1 the coefficients 

Pz—z in (8.3) add to less than unity, so that M must be zero. The 

same argument obviously applies also if p_, = 0, since we can replace 

p_1 by some positive coefficient p; with k < 0. ; 
It follows that (8.2) has a unique solution, and thus our problem is 

determined. Equation (8.2) plays the role of the difference equation 

(2.1). Again we can simplify the writing by introducing the boundary 

conditions 

Uz = 1 if «<0 

uz =0 fz. >a. 


Then (8.2) can be written in the form 


(8.4) 


(8.5) Uz = LuzPz_2, 


the summation now extending over all x (for x > a we have no con- 
tribution owing to the second condition (8.4); the contributions for 
x < 0 add to r+ owing to the first condition). 

For large a it is cumbersome to solve a—1 linear equations directly, 
and it is preferable to use the method of particular solutions analogous 
to the procedure of section 2. It works whenever the probability dis- 
tribution {p+} has relatively few positive terms. Suppose that only 
the p, with —v < k < u are different from zero, so that the largest 


possible jumps in the positive and negative directions are u and v, 
respectively. The characteristic equation 


(8.6) Zp;* = 1 


is equivalent to an algebraic equation of degree v + g. If s is a root 
of (8.6), then wu. = s7 is a formal solution of (8.5) for all z, but this 
solution does not satisfy the boundary conditions (8.4). If (8.6) has 
p+» distinct roots sı, s2, ..., then the linear combination 


(8.7) Us = BAr? 


is again a formal solution of (8.5) for all z, but we must adjust the 
constants A; to satisfy the boundary conditions. Now for 0 < z < a 
only values z with —v + 1 < z < a + p — 1 appear in (8.5). It suf- 
fices therefore to satisfy the boundary conditions (8.4) for z = 0, —1, 
—2, ..., —v+1, and z = a, a+1, ..., a+u—1, so that we have u + v 
conditions in all. If s+ is a double root of (8.5), we lose one constant, 
but in this case it is easily seen that u, = zsz? is another formal solu- 


tion. In every case the » + v boundary conditions determine the uty 
arbitrary constants. 
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Example. Suppose that each individual step takes the particle to 
one of the four nearest positions, and wre let p—2 = pi = pı = po = 4. 
The characteristic equation (8.6) iss~? + s-1+s5+4+ $ = 4. Tosolve 
it we put f= s+ s~*: with this substitution our equation becomes 
? + t = 6, which has the roots £ = 2, —3. Solving t = s + s™ for s, 
we find the four roots 

—3+5 —3 — 5 


(88) s1 =s=1, s= om = ee, 4 = Ee = 8555 


Since s; is a double root, the general solution of (8.5) in our case is 
(8.9) uz = Ay + Agz + Agsg® + Aqsa’. 


The boundary conditions are uo = u—ı = 1, and ua = vey, = 0. They 
lead to four linear equations for the coefficients A; and to the final 
solution 


= z , (2z —a)(ss* — 84%) — a(s3°~* — s39) 
6:10)" t= 1— a + a{(a + 2)(s3* — s4°) — a(sgt? — s¢**)} 


with s3 and s4 given by (8.8). 


Numerical Approximations. Usually it is cumbersome to find all the roots, 
but rather satisfactory approximations can be obtained in a surprisingly simple 
way. Consider first the case where the probability distribution {pp} has mean 
zero. Then the characteristic equation (8.6) has a double root at s = 1, and 
A + Bz is a formal solution of (8.5). Of course, the two constants A and B do not 
suffice to satisfy the n+» boundary conditions (8.4). However, if we determine 
A and B so that A + Bz vanishes for z = a + 4 — 1 and equals 1 for z = 0, 
then A + Br > 1 for z < 0 and A + Bz > 0 fora<2<a@+450 that A + Bz 
satisfies the boundary conditions (8.4) with the equality sign replaced by “greater 
than or equal to.” The difference A + Bz — uz is therefore a formal solution of 
(8.5) with non-negative boundary vaJues whence A + Bz — u: > 0. In like man- 
ner we can get a lower bound for us by determining A and B so that A’+ Bz 


vanishes for z = a and equals 1 for z = —» +1. Hence we have 
a—z a+u—z-—l1 
G < . 
(8.11) Eee nai Mate 


This estimate is excellent provided a is large as compared to p + v. (Of course, 
Uz = (1 — 2/a) is a better approximation but does not give precise bounds.) 

Next, consider the general case where the mean of the distribution {px} is not 
zero. The characteristic equation (8.6) has then a simple root at s = 1. The left 
side of (8.6) approaches œ% as s — 0 and as s — œ. For positive s the curve 
y = Ep: is continuous and convex, and since it intersects the line y = lats = 1, 
there exists exactly one more intersection. Therefore, the characteristic equation 
(8.6) has exactly two positive roots, 1 and sı. As before, we see that A + Ba; 
is a formal solution of (8.5), and we can apply our previous argument to this solu- 
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tion instead of A + Bz. We find in this case 


1° — 81° stai — s 
z 


(8.12) 
and have the 


Theorem. The solution of our ruin problem satisfies the inequalities (8.11) if {px} 
has zero mean, and (8.12) otherwise. Here s; is the unique positive root different from 
1 of (8.6), and p and —» are defined, respectively, as the largest and smallest subscript 
for which pk # 0. 


a? — 378 — 


, 
Fi | 
2 spre 


Let m = Dkpx be the expected gain in a single trial (or expected length of a single 
step). It is easily seen from (8.6) that sı > 1 or sı <1 according to whether 
m <Oorm>O. Letting a — ~, we conclude from our theorem that in a game 
against an infinitely rich adversary the probability of an ultimate ruin is one if and 
only if m < 0. 

The duration of game can be discussed by similar methods (cf. problem 4). 


9. PROBLEMS FOR SOLUTION 


1. Consider the ruin problem of sections 2 and 3 for the case of a modified ran- 
dom walk in which the particle moves a unit step to the right or left, or stays 
at its present position with probabilities a, 8, y, respectively (a + 8 + y = 1). 
(In gambling terminology, the bet may result in a tie.) 

2. Consider the ruin problem of sections 2 and 3 for the case where the origin 
is an elastic barrier (as defined in section 1). The difference equations for the 
probability of ruin (absorption at the origin) and for the expected duration 
are the same, but with new boundary conditions. 

3. A particle moves at each step two units to the right or one unit to the left, 
with corresponding probabilities p and g (p + q = 1). If the starting position 
is z > 0, find the probability that the particle will ever reach the origin. (This 
is the ruin problem against an infinitely rich adversary.) 

Hint: The equation corresponding to (2.1) has the particular solution q, = 1 
and two particular solutions of the form \*, where À satisfies a quadratic equa- 
tion. 

4. In the generalized random-walk problem of section 8 put {in analogy with 


(8.1)] ps = Pa—s + Pati—z + Pay2-2+---, and let dz, be the probability 
that the game lasts for exactly n steps. Show that for n > 1 


a~l 
dany = D denPs—e 
with d,1 = Ts + pz- Hence prove that the generating function d,(s) = Ed,,,8" 
js the solution of the system of linear equations 


a—l 
s—d,(s) — È d-(s)pz— = fr: + pe- 
By differentiation it follows that the expected duration ez is the solution of 


a—l 
B= >. apea 1. 


z=1 
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5. In the random walk with absorbing barriers at the points 0 and a and 
with initial position z, let W:,x(%) be the probability that the nth step takes 
the particle to the position z. Find the difference equations and boundary 
conditions which determine Wz,n(Z). 

6. Continuation. Modify the boundary conditions for the case of two 
reflecting barriers (i.e., elastic barriers with 6 = 1). 


Note: In the following problems Yz,n 8 the probability (6.1) that in an unrestricted 
random walk starting at the origin the nth step takes the particle to the Position z. 


7. Method of images. Let p = q=}. Ina random walk in (0, œ) with 
an absorbing barrier at the origin and initial position at z, let w.,,(z) be the 
probability that the nth step takes the particle to the position z. Show that 
Usn(Z) = Vz—s,n — Vz+s,ne (Hint: Show that a difference equation corre- 
sponding to (4.1) and the appropriate boundary conditions are satisfied.) 

8. Continuation. If the origin is a reflecting barrier, then 


Us, n(T) = Veen + Verein 


9. Continuation. If the random walk is restricted to (0, a) and both bar- 
riers are absorbing, then 


(9.1) Us,n(Z) = x {Vz—s—2ka,n — Vz42—2kan}, 


the summation extending over all k, positive or negative (only finitely many 
terms are different from zero). If both barriers are reflecting, equation (9.1) 
holds with minus replaced by plus. 

10. Distribution of maxima. In a symmetric unrestricted random walk 
starting at the origin let M, be the maximum abscissa of the particle at times 
0, 1, 2, ...,. Using the formula of problem 7, show that 


(9.2) P{M, = 2} = van + Usgiin 


11. Let V.(s) = Dvz,n8" (cf. the note preceding problem 7). Prove that 
Vs) = Vo(z)A2~*(s) when x < 0 and Vs) = Vo(s)Ar-*(s) when z > 0, where 
à(s) and A(s) are defined in (4.8). Moreover, Vo(s) = (1 — 4pqs?)=3, a 

Note. These relations follow directly from the fact that A(s) and A2(s) are 
generating functions of first-passage times as explained at the conclusion of 
section 4. 

12. In a random walk in (0, œ) with an absorbing barrier at the origin and 
initial position at z, let us,„(£) be the probability that the nth step takes the 


2 Problems 7-9 are examples of the method of images. The term Vr—z,n COTe 
responds to a particle in an unrestricted random walk, and Vz+z,n to an “image 
point.” In equation (9.1) we find image points starting from various positions, 
obtained by repeated reflections at both boundaries. In problems 12 and 13 we 
get the general result for the unsymmetric random walk using generating functions. 
In the theory of difference equations the method of images is always ascribed to 
Lord Kelvin. The equivalent reflection principle is generally attributed to D. 


André. See footnote 4 of chapter III. 
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particle to the position z, and let D 
(9.3) e U.(8;z) = È ta,n(z)8". 


Using próblem 11, show that U,(s;z) = Vz—s(s) — As*(s)V.(s). Conclude 
(9.4) Ue.n(Z) = Vz—2.n — (9/P)*-Vz+s,n- 

Compare with the result of problem 7 and derive (9.4) from the latter by 
combinatorial methods. 


13. Alternative formula for the probability of ruin (5.7). Expanding (4.11) 
into a geometric series, prove that 


2 p" 2 ppyte— 
Usn = $D 8) Ws+2a,n — >» (7) Wka—z,n 
q. ket \Q. 


k=O 
with w,,, defined in (5.9). 


14. If the passage to the limit of section 6 is applied to the expression for 
Uz,n given in the preceding problem, show that the probability of absorption 
during a short time interval of length Aż is asymptotically'* 


1 ay DE) -teten £ (2 + 2kaje™t (+Kan, 
2 k=—o 


(Hint: Apply the normal approximation to the binomial distribution.) 
15. Renewal method for the ruin problem. In the random walk with two 
absorbing barriers let uz,» and uz,„* be, respectively, the probabilities of ab- 


sorption at the left and the right barriers. By a proper interpretation prove 
the truth of the following two equations: 


V_s) = U.(s)Vo(s) + U.*(s)V —a(8), 
Va—z(8) = U.(8)Va(s) + U.*(s) Vols). 
By solving this system for U,(s), derive (4.11). 
16. Let u.,n(x) be the probability that the particle, starting from z, will at 
the nth step be at z without having previously touched the absorbing barriers. 


Using the notations of problem 15, show that for the corresponding generating 
function U,(s; z) = Dus,n(xz)3" we have 


£ U.(s; £) = Vz_(s) — U.(s)V{s) — U:*(8)V2—a(8). 
(No calculations are required.) 
17. Continuation. The generating function U.(s; x) of the preceding prob- 
lem can be obtained by putting U.(s;z) = Vz_.(s) — Adi*(s) — Bds‘(s) and 
determining the constants so that the boundary conditions U,(s;z) = 0 for 


z= 0 and z = a are satisfied. With reflecting barriers the boundary condi- 
tions are Uo(s; x) = Ui(s; x) and U,(s;z) = Ua_a(s; 2). 
es =e 


13 The agreement of the new formula with the limiting form (6.10) is a well- 
known fact of the theory of theta functions. 

M4 Problems 15-17 contain a new and independent derivation of the main results 
concerning random walks in one dimension. 
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18.15 A symmetric unrestricted random walk starts at the origin. The 
probability that the rth return to the origin occurs at the nth step equals the 
probability that the first passage through 7 occurs at the (n—r)th step. 
(Hint: Compare the generating functions.) 


19. Prove the formula 
= 
tsa = (2) arpata giana f catt vonie Ai. 


by showing that the appropriate difference equation is satisfied. Conclude that 


Vole) = (acy Q eee 


20. In a three-dimensional symmetric random walk the particle has prob- 
ability one to pass infinitely often through any particular line z = my=n. 
(Hint: Cf. problem 1.) 

21. In a two-dimensional symmetric random walk starting at the origin 
the probability that the nth step takes the particle to (z, y) is 


(2n)#2-* f f (cos œ -+ cos 8)"-cos za-cos yB-da dB. 


Verify this formula and find the analogue for three dimensions. (Hint: Check 
that the expression satisfies the proper difference equation.) 

22. In a two-dimensional symmetric random walk let D,? = z? +7? be 
the square of the distance of the particle from the origin at time n. Prove 
E(D,’”) =n. [Hint: Calculate E(D,,.* — D,?).] 

23. In a symmetric random walk in d dimensions the particle has probability 
1 to return infinitely often to a position already previously occupied. (Hint: 
At each step the probability of moving to a new position is at most (2d — 1) + 
+ 2d.) 


16 This is theorem 3 of chapter III, section 4. 


CHAPTER XV 


Markov Chains 


1. DEFINITION 

Up to now we have been concerned mostly with independent trials 
which can be described as follows. A set of possible outcomes E, 
E», ..., (finite or infinite in number) is given, and with each there is 
associated a probability px; the probabilities of sample sequences are 
defined by the multiplicative property P{(E;,, Zj,,...,E;,)} = 
= Pi, Pi *** Piy In the theory of Markov! chains we consider the 
simplest generalization which consists in permitting the outcome of any 
trial to depend on the outcome of the directly preceding trial (and only 
on it). The outcome £; is no longer associated with a fixed probability 
Px, but to every pair (E;, Ex) there corresponds a conditional probability 
pix; given that E; has occurred at some trial, the probability of Ep at 
the next trial is p;,. In addition to the pj we must be given the prob- 
ability a, of the outcome E; at the initial trial. For Pik to have the 
meaning attributed to them, the probabilities of sample sequences 
corresponding to two, three, or four trials must be defined by 


P{(E;, Ex)} = apir,  P{(E;, Ex, E,)} = aypjxper, 
P{(E;, Er, Er, Be)} = aspepirDra, 
and generally 
(11) P{(Eiy Ei, ---, Bi)} = GPi Piss *** Pinsin1Pinsiee 


Here the initial trial is numbered zero, so that trial number one is the 
second trial. (This convention is convenient and has been introduced 
tacitly in the preceding chapter.) 


Examples. (a) Every Markov chain is equivalent to an urn model 
as follows. Each occurring subscript is represented by an urn, and ° 


1A. A. Markov (1856-1922). 
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each urn contains balls marked Æ£,, Ea, .... The composition of the 
urns remains fixed, but it varies from urn to urn; in the jth urn the 
probability to draw a ball marked Æx is pj. At the initial, or zero-th, 
trial an urn is chosen in accordance with the probability distribution 
{a;}. From that urn a ball is drawn at random, and if it is marked E;, 
the next drawing is made from the jth urn, etc. Obviously with this 
procedure the probability of a sequence (Ej, ..., E;,) is given by (1.1). 
We see that the notion of a Markov chain is not more general than urn 
models, but the new symbolism will prove more practical and more 
intuitive. 

(b) Independent trials are, of course, the special case of our scheme 
with pj, = ap for each j. 


If a, is the probability of Ex at the initial (or zero-th) trial, we must 
have a, > 0 and Za, = 1. Moreover, whenever E; occurs it must be 
followed by some Æx, and it is therefore necessary that for all j and k 


(1.2) Pit Piz + pat...=1, ped. 


We want to show that for any numbers a; and pj, satisfying these 
conditions, the assignment (1.1) is a permissible definition of probabil- 
ities in the sample space corresponding to n + 1 trials. The numbers 
defined in (1.1) being non-negative, we need only prove that they add 
to unity. Fix first jo, jı, ..-, Jn—y and add the numbers (1.1) for all 
possible jan. Using (1.2) with j = j,_1, we see immediately that the 
sum equals @;,Dj,j, +++ Pinsin Thus the sum over all numbers (1.1) 
does not depend on n, and since Zaj, = 1, the sum equals unity for all n. 

The definition (1.1) depends formally on the number of trials, but 
our argument proves the mutual consistency of the definitions (1.1) 
for all n. For example, to obtain the probability of the event “the 
first two trials result in (Z;, Ey),” we have to fix jo = jand jı = k, and 
add the probabilities (1.1) for all possible jo, js, ..., jn. We have just 
shown that the sum is a;p;x and is thus independent of n. This means 
that it is usually not necessary explicitly to refer to the number of 
trials; the event (Ej, ..., H;,) has the same probability in all sample 
spaces of more than r trials. In connection with independent trials 
it has been pointed out repeatedly that, from a mathematical point of 
view, it is most satisfactory to introduce only the unique sample space 
of unending sequences of trials and to consider the result of finitely 
many trials as the beginning of an infinite sequence. This statement 
holds true also for Markov chains. Unfortunately, sample spaces of 
infinitely many trials lead beyond the theory of discrete probabilities 
to which we are restricted in the present volume. 
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To summarize, our starting point is the following 


Definition. A sequence of irials wiih possible outcomes E, Ez ... 
will be called a Markov chain? if the probabilities of sample sequences are 
defined by (1.1) in terms of an initial probability distribution {ax} for 
the states Ey, at time O and fixed conditional probabilities pjr of Ex, given 
that E; has occurred at the preceding trial. 


We shall now modify our terminology to conform to the usage in 
physical applications. Instead of saying “the nth trial results in Ey,” 
we shall say that at time n the system is in state Ey. The conditional 
probability pjz will be called the probability of the transition E; > Er 
(from state E; to state E,). 


The transition probabilities pj will be arranged in a matriz of transi- 
tion probabilities 


Pir Piz Piz 
P21 P22 Pag 
(1.3) P = | Pai Paz Pa ... 


where the first subscript stands for row, the second for column. Clearly 
P is a square matrix with non-negative elements and unit row sums. 
Such a matrix (finite or infinite) is called a stochastic mairiz. Any 
stochastic matriz can serve as a matriz of transition probabilities; together 
with our initial distribution far} it completely defines a Markov chain 
with states Ey, Eo, .... 

In some special cases it is convenient to number the states starting 


with 0 rather than with 1. A zero row and zero column are then to 
be added to P. 


2, ILLUSTRATIVE EXAMPLES 


This section contains examples which will familiarize the reader with 
the notion of a Markov chain. To save Space we shall refer to some 
of them as the occasion arises, but the reader is advised not to store 


* This is not the standard terminology. We are here considering only a special 
class of Markov chains, and, strictly speaking, here and in the following sections 
the term Markov chain should always be qualified by adding the clause “with 
constant transition probabilities.” Actually, the general type of Markov chain 
is rarely studied. It will be defined in section 10, where the Markov property will 
be discussed in relation to general stochastic processes. There the reader will also 
find examples of dependent trials that do not form Markov chains, 
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the examples in his mind. For the classical example of card shuffling 
see section 9. 

(a) Suppose that there are only two states E, E> also called suc- 
cess” and “failure.” The matrix P is of the form 


p-[? A pg = pat 
P g 
and p, p’ are the probabilities of success following success, and suc- 
cess following failure, respectively. For a particular example, imagine 
a ball moving with velocity +1 in the direction of the z-axis. At times 
1, 2, ... the ball reverses its direction with probability q, and keeps it 
with probability p. If Æ, stands for velocity -+1 and F, for —1, the 
matrix of transition probabilities is of the form described with q =p 
and p' = q. (This experiment could be simulated by means of a large 
regular pegboard.) 

(b) Random walk with absorbing barriers. Let the Possible states be 
Eo, E1, ..., Ea and consider the matrix of transition probabilities 


OPO. D sa “OP 00 
7 Op 0 ne 0 650 
0 @ Op 33. 0150 40 


0 0: 0.0 Wc. gong 
00 0 0 sf OORE. 

From each of the “interior” states £}, ..., Es_1 transitions are pos- 
sible to the right and the left neighbors (with Dii41 = pand pii = q). 
However, no transition is possible from either Eo or E, to any other 
State; the system may move from one state to another, but once Eo or 
Ea is reached, the system stays there fixed forever. Clearly this 
Markov chain differs only terminologically from the model of a ran- 
dom walk with absorbing barriers at 0 and a discussed in the last 
chapter. There the random walk started from a fixed point z of the 
interval. In Markov chain terminology this amounts to choosing the 
initial distribution so that a, = 1 (and hence a, = 0 for x ~ z). If we 
had chosen the initial state at random we would have ak = (a+ 1)" 
for#=0,1,. ., a. 

(c) Elastic barriers. We next consider a matrix which differs from 
the preceding one only in the rows number 1 and a — 1. Choose 
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0 < ôo < l and0 < ô, < 1 and set 


1 ONTO 110°. 20 o ] 
(Q — dg ôg p 0 ... 0 0 0 
0 Ge 0 P a H.-H 0 
P= : er eee. sk s 
0 OF 0 "0 5.2 6 p 0 

0 000 =... q öp (1 — &)p 
0 OPO, 0h cn 6 1 


The transition probabilities are the same as before except that from 
E, a passage to Eo has only probability (1 — 69)g, and with probability 
oq the system stays at £1; a similar statement holds for E,-1. For 
o = ôa = 0 our matrix is identical with the preceding one. When 
ðo = a = 1, no passage into Eo and Eg is possible; a system starting 
at an interior state E; will move from state to state but never enter Eo 
or Ea. In random-walk terminology this last situation corresponds to 
reflecting barriers (cf. chapter XIV). In betting language the state of 
the system represents the capital of a player in a game where the two 
players own between them the amount a. Each time the first player 
loses his last dollar, the adversary replaces it with probability ôo, and 
with probability 1 — 6) the game terminates. With two reflecting 
barriers the game never terminates. 

(d) Cyclical random walks. Again let the possible states be Æ}, Eo, 
..., Ea but order them cyclically so that Ea has the neighbors Es- 
and E,. If, as before, the system always passes either to the right 
or to the left neighbor, the rows of the matrix P are as in example 
(b), except that the first row is (0, p, 0, 0, ...,0,q) and the last 
(p, 0, 0,0, ..., 0, g, 0). 

More generally, we may permit transitions between any two states, 
Let qo, 91, ---, Qa—1 be, respectively, the probability of staying fixed 
or moving 1, 2, ..., a—1 units to the right (where k units to the right 
is the same as a — k units to the left). Then P is the cyclical matrix 


| % qi Q2 +++ Ga-2 amı 
Qa—ı Go Qı +++ Qa—3 Qa—2 
P =!fa—2 Yo-1 o +++ Qa—4 Qa-3 


qı G2 Q3 ++» Qa-1 40 
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If qı = P, qa—1ı = 4, and qk = 0 for 1 < k <a — 1, then this random 
walk reduces to the simple case discussed at the beginning of this 
example. [The discussion is continued in example XVI(2.d).] 

(e) Unrestricted random walks. An unrestricted one-dimensional 
random walk is a Markov chain, but it is most natural to order the 
states in a doubly infinite sequence (... H_2, H_1, Eo, Ey, Ee, ...). 
In order to write the matrix of transition probabilities in the familiar 
form, we must rearrange the states. For example, for the ordering 
(Eo, E1, H_1, E2, E_2, ...) the first row of P becomes (0, p, q, 0,0, ...), 
the second (q, 0, 0, p, 0,0, ...), etc. Unfortunately, the natural sym- 
metry is lost, and the formulas become unpleasant. The situation 
grows even worse in two dimensions. In such cases the methods of 
this chapter are not convenient for deriving explicit formulas, but the 
general theorems apply and contain pertinent information. 

(f) The Ehrenfest model of diffusion. Once more we consider a 
chain with the a + 1 states Eo, E1, ..., Ea and transitions possible 
only to the right and to the left neighbor; however, this time we put 
Di.j41 = 1 —j/a and p;,;-1 = j/a, so that 


0 1 0 0 wens “Oa E) 

at 0 Isa 0 wn 7020 

0: 2a 0 t—2a 2... OF O 
P= 

0 0 0 0 MRE A 

0 0 0 0 T AN C 


This chain has two interesting physical interpretations. For a dis- 
cussion of various recurrence problems in statistical mechanics P. and 
T. Ehrenfest è described a conceptual experiment where a molecules 
are distributed in two containers A and B. At time n a molecule is 
chosen at random and removed from its container to the other. The 
state of the system is determined by the number of molecules in A. 
Suppose that at a certain moment there are exactly k molecules in the 


3P, and T. Ehrenfest, Uber zwei bekannte Einwände gegen das Boltzmannsche 
H-Theorem, Physikalische Zeitschrift, vol. 8 (1907), pp. 311-314. Ming Chen 
Wang and G. E. Uhlenbeck, On the theory of the Brownian motion II, Reviews of 
Modern Physics, vol. 17 (1945), pp. 323-342. For a more complete discussion (by 
methods essentially equivalent to those of chapter XVI) see M. Kac, Random 
walk and the theory of Brownian motion, American Mathematical Monthly, vol. 54 
(1947), pp. 369-391. See also B. Friedman, A simple urn model, Communications 
on Pure and Applied Mathematics, vol. 2 (1949), pp. 59-70. 
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container A. At the next trial the system passes into Ex_y or Ekg 
according to whether a molecule in A or B is chosen; the corresponding 
probabilities are k/a and (a — k)/a, and therefore our chain describes 
Ehrenfest’s experiment. However, our chain can also be interpreted 
as diffusion with a central force, that is, a random walk in which the 
probability of a step to the right varies with the position. From z = j 
the particle is more likely to move to the right or to the left according 
as j < a/2 or j > a/2; this means that the particle has a tendency to 
move toward z = a/2, which corresponds to an attractive elastic force 
increasing in direct proportion to the distance. (The Ehrenfest model 
has been described in example V(2.c); see also example (6.a) and prob- 
lem 12.) 

(g) Occupancy problems. In chapter I we considered random place- 
ments of balls into a cells. Let the number of occupied cells determine 
the state of the system. -If j cells are occupied, the probability that 
the next ball is placed into an empty cell is (a —j)/a, Hence the 
experiment is described by a chain with transition probabilities 
Pjj = j/a, Bj 541 = (a — J)/a, and p;x = 0 for all other combinations 
ofj and k. The initial distribution (all cells empty) is given by po = 1, 
Pr = Oforl<k<a. [Cf example XVI(2.¢).] 

(h) Success runs. In a sequence of Bernoulli trials we agree to say 
that at time n we observe the state Ep if the nth trial results in failure, 
and the state Ey (=1,2,.. -, n) if the last failure occurred at trial 
number n — k (the zero-th trial counting as failure). In other words, 
the index k of the state Ez indicates the length of the uninterrupted 
Sequence of successes ending at the nth trial. It is-obvious that we are 
dealing with a Markov chain in which only the transitions Ep — Ey 
and Ez — Er41 are possible, and the matrix of transition probabilities 
takes on the form i 


qpod00 
q0poO 
P=]q 00 p 


(2) Recurrent events, 
interesting Markov chain, 


e times given by {fa}. Conventionally 
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ner of speaking, we are dealing with the waiting time in the negative 
direction.) As in the last example, it is clear that the state Ep at the 
nth trial can be succeeded only by Eo (if & occurs) or by Ex4i. Put 


fe I-s 
C) x=fAt...th p= Pe =1— gp = Es 
l= 4% 1-5 


Observing Ep means that the waiting time for & exceeds k, and the 
probability of & occurring at the next trial under this hypothesis equals 
qx- Accordingly the transitions E} — Hj,4, and Ep — Ey have prob- 
abilities p and gr, respectively. A typical sample sequence is of the 
form EoEoH#,E2E3EoE,EoE\E2NoEy (the first Eo representing the 
zero-th trial). Here the waiting times are successively 1, 4, 2, 3, 1, 
and the probability of our sequence equals f; fa fof fıs Now 


(2.2) Jifafefafı = GoPoP1P293P0%1PoP19290 


in accordance with the rule (1.1) for probabilities in Markov chains. 
This reasoning applies to all sequences, and we see that the process is 
a Markov chain with the matrix 


[Continued in example (6.c).] 

(J) Sequential sampling. As we have seen in chapter XIV, section 
8, the following problem occurs in sequential sampling. LetS, = X; + 
-++...+ Kn, where the X, are mutually independent random variables 
assuming only integral values and having a common distribution {px}, 
k=0, +1, +2, .... For preassigned z > 0, b > O there exists a 
smallest n for which either S, > b or Sa < —z. This nis, of course, a 
random variable, and we are interested in its distribution and in the 
probabilities of the two contingencies S, < —z and S, > b. 

The problem can be formulated in terms of a Markov chain with 
states 0, 1, 2, ..., b+z as follows. Let a = b+ z— 1 and choose z 
for the initial state. We say that at time n the system is in the state 
z (where x = 1,2, ..., a) if z +S, = z provided, however, none of the 
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sums 2+S;, ..., 2+S,_1 is <0 or >a; and with the same proviso we 
say that the system is in state 0 if z+S, < 0 and in state a F lif 
z+S, >a-+1. Once the system passes into one of the two limit- 
ing states 0 and a+ 1, it remains there forever (that is, we put 
Po,o = Pa+1,a+1 = 1). The matrix of transition probabilities is 


1 0 0 0 ee 0 0 
tı Po Pı P2 +++ Pa-1 Pı 
T2 P- Po Pı Pa—2 P2 
Pe T3 P-2 P-1 Po +++ Pa—3 P3 
Ta P—a+ı P—a42 P—a43 -..- Po Pa 
0 0 0 0 ‘ee 0 1 
- where 
Th = DET p-r + Pr- + P_p_g +... 
and 


Pk = Pa—k+41 + Pa—k42+.... 


As an illustration, take Bartky’s double-sampling inspection scheme. 
To test a consignment of items, samples of size N are taken and sub- 
jected to complete inspection. It is assumed that the samples are 
stochastically independent and that the number of defectives in each 
has the same binomial distribution. Allowance is made for one defec- 
tive item per sample, and so we let X, + 1 equal the number of defec- 
tives in the kth sample. Then for k > 0 


N 
k-+1,N—k—1 
(2.3) Di = (, ) peg 1 


and p_; = q¥, Pz = Oforz < —1. The procedural rule is as follows: 
A preliminary sample is drawn and, if it contains no defective, the 
whole consignment is accepted; if the number of-defectives exceeds a, 
the whole lot is rejected. In either of these cases the process stops 
and we have no Markov chain. If , however, the number z of defectives 
lies in the range 1 < z < a, the sampling continues in the described 
way as long as the state of the chain is contained between 1 and a. 
Sooner or later it will pass either into 0, in which case the consignment 
it accepted, or into a + 1, in which case the consignment is rejected. 
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(k) An example from genetics. Consider a population kept constant 
in size by the selection of N individuals in each successive generation. 
A particular gene assuming the forms A and a has 2N representatives; 
if in the nth generation A occurs j times, then a occurs 2N — j times. In 
this case we say that the population is at time n in state j (0 < j < 2N). 
Assuming random mating, the composition of the following generation 
is determined by 2N Bernoulli trials in which the A-gene has probability 
j/2N. We have therefore a Markov chain with 


2) jN j \2N-* 
Bik ( 4 Gy ¢ w 
[Cf. example (8.c).] 

U) A breeding problem. In the so-called brother-sister mating two 
individuals are mated, and among their direct descendants two indi- 
viduals of opposite sex are selected at random. These are again mated, 
and the process continues indefinitely. With three genotypes AA, Aa, 
aa for each parent, we have to distinguish six combinaiions of. parents 
which we label as follows: EF, =.AA X AA, Ez = AA X Aa, E; = 
= Aa X Aa, E, = Aa X aa, E; = aa X aa, Es = AA Xaa. Using 
the rules of chapter V, it is easily seen that the matrix of transition 
probabilities is in this case 


10000 0 
+ F 2 00) 40 
w bbdde 3 
o, 0 $ AO 
Onet 0 ORRO 
0 0100 0 


[The discussion is continued in problem 4; a complete treatment is 
given in example XVI(4.b).] 


3. HIGHER TRANSITION PROBABILITIES 


A transition from H; to Ep in exactly n steps can occur via different 
paths E; > E; > E} >... — E;,_, — Er. The conditional prob- 


4 This problem was discussed at length by R. A. Fisher and S. Wright. The 
formulation in terms of Markov chains is due to G. Malécot, Sur un probléme de 
probabilités en chaine que pose la génétique, Comptes rendus de l'Académie des 
Sciences, vol. 219 (1944), pp. 379-381. 
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ability that the system passes through this particular path given that 
it is at E; is p;3,Pj,3, °° Pi.-k- The sum of the corresponding expres- 
sions for all possible paths is the probability of finding the system at 
time r + n in state Er, given that at time r it was in state E;. We denote 


(3.1) Pie = DL PivPok- 

By induction we find easily the recursion formula 
(8.2) pet? = 2 Papi; 

a further induction on m shows that more generally 
(3.3) rye? = x piP Da 


This equation reflects the simple fact that the first m steps lead the 
system from H; to some intermediate state H,, and the last n steps 
from E, to Ep. The identity (3.3) is characteristic for Markov chains. 
For more general processes (cf. section 10) an analogous equation holds, 
but the last factor depends not only on v and k but also on j. 

In the same way as the pj form the matrix P, we arrange the pp in 
a matrix to be denoted by P”. Equation (3.2) states that to obtain the 
element pt” of P"*! we have to multiply the elements of the jth 
row of P by the corresponding elements of the kth column of P” and 
add all products. This operation is called row-into-column multiplica- 
tion of the matrices P and P” and is expressed symbolically by the 
equation P"*! = PP". This suggests calling P” the nth power of P; 
equation (3.3) expresses the associative law P™*+" = Pmp’, 

In order to have (3.3) true for all n > 0 we define 7? by pẹ =1 
and pẹ = 0 for j = k as is natural. 


Examples, (a) In the trivial case of independent trials all rows of 
P are identical, and it is clear without calculations that P” = P for 
all n. 

(b) In the success run, example (2.h), the n-step transition prob- 
abilities can be written down directly. For example, in three steps the 


system can pass from Ex only into Ez4s, Eo, E1, Eo and the correspond- 
ing probabilities are clearly p*, q, gp, gp?. Thus 
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qo p00... q ap op? p? 0 0 
g gp O ip? (0 "2. a gp gp? 0 p 0 
P?=/¢ gp 0 0 p? ...|) P=I/¢ ap gp? 0 0 p 


In this case it is clear that P” converges to a matrix such that all ele- 
ments in the column number k equal gp*. 


Absolute Probabilities 

Let again a; stand for the probability of the state E; at time 0. The 
(unconditional) probability of finding the system at time n in state Ep 
is then 


(3.4) aP = X api. 
j 


Usually we let the process start from a fixed state E;, that is, we put 
a; = 1, In this case a{” = pf. 

We feel intuitively that the influence of the initial state on the prob- 
ability distribution at time should gradually wear off so that for 
large n the distribution (3.4) should be nearly independent of the 
initial distribution {a;}. This is the case if (as in the last example) 
p® converges to a limit independent of j, that is, if P” converges to a 
matrix with identical rows. We shall see that this is usually so, but 
once more we shall have to take into account the annoying exception 


caused by periodicities. 
4, CLOSURES AND CLOSED SETS 


We shall say that Ex can be reached from E; if there exists some n > 0 
such that p% > 0 (i.e., if there is a positive probability of reaching Ex 
from Æ; including the case E = H;). For example, in an unrestricted 
random walk each state can be reached from every other state, but 
from an absorbing barrier no other state can be reached. 


Definition. A set C of states is closed if no state outside C can be 
reached from any state E; in C. The smallest closed set containing C is 


called the closure of C. 
A single state E; forming a closed set will be called absorbing. 


A Markov chain is irreducible if there exists no closed set other than 
the set of all states. 
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Clearly C is closed if, and only if, pj, = 0 whenever j is in C and k 
outside C, for in this case we see from (3.2) that pi? = 0 for every n. 
We have then the obvious 


Theorem. Ifin the matrices P” all rows and all columns correspond> 
ing to states outside the closed set C are deleted, there remain stochastic 
matrices for which the fundamental relations (3.2) and (3.3) again hold. 


This means that we have a Markov chain defined on C, and this 
subchain can be studied independently of all other states. 

The state Ex is absorbing if, and only if, pz, = 1; in this case the 
mairix of the last theorem reduces to a single element. The closure of 
a single state E; is the set of all states which can be reached Srom it (includ- 


ing H;). This remark may be reformulated in the form of the follow- 
ing useful 


Criterion. A chain is irreducible if, and only if, every state can be 
reached from every other state. 


Example. In order to find all closed sets it suffices to know which 
Pjk vanish and which are positive. Accordingly, we use a + to denote 
positive elements and consider a typical matrix, say 


00+ 0 


* 


* 


0 * 


* 


* 


> O So. .o ‘Oo #4 OOO 
* oo © 

o 68 6 & ovo 
oo 86} 6 

O * | O eo, oto 

or oe oS So, oOo. 0 2& 
>o c o ooo 

© © ©0LO Q O 


0 * * J. 
In the fifth row a * appears only at the fifth place, and therefore 
Pss = 1: the state 5 is absorbing. The third and the eighth row con- 
tain only one positive element each, and it is clear that E3 and Es 
form a closed set. From £, passages are possible into Hy and Ep, 
and from there only to E\, E4, Eg. Accordingly the three states E, 
Eg, Eg form another closed set, 

It is now apparent that the com 


i 2 plication of P arises mainly from an 
Inconvenient notation. 


Let us relabel the states as follows: 
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B= Es; Ege By: ily om Es; E's =E; E's = By; 
E's = E; EB’; = Ep; E's = Ez; Ely = Es. 


The elements of the matrix P are rearranged in like manner, and P 
takes on the form z 
* 00 000 000 


& 0 « To 0 a Geog 

0 + 900 000 
pil [0-40 0» raw oy oy 
O BO 0 ONe a ap Seay 
oo. € oO = 0: 10) 2.0) oo 


* * 0 0 * 0 ORO 

0 00 000 * * + 

0 00 oe 0 * 0.101 
In this form the closed sets (E'), (E's, "3) and (E's, E's, B's) are evi- 
dent. From F’; a passage is possible into each of these three closed 
sets, and therefore the closure of E’; is the set of states E'u E's, Elg, 
E'4, E's, B'e, E'7. From E’s a passage is possible into F’; and E's and 
hence into each closed set: the closures of E’s and of E’y consist of 
all nine states. 

Deleting all rows and all columns outside a closed set, we obtain the 

three stochastic submatrices 


0 * * 
(4.1) [+] f ‘| O * x 
22060 


and P’ contains no other stochastic submatrices, 
The reader is asked to find for himself the absorbing states and the 
closed sets in the matrices of the examples of section 2. 


5. CLASSIFICATION OF STATES 


Consider an arbitrary, but fixed, state E; and suppose that initially 
the system is in E;. Every time the system passes through E; the 
Process recommences from scratch exactly as it has begun. It is there- 
fore clear that the return to Ej is a recurrent event as defined in chapter 
XIII. If the system starts from another state H;, then the passage 
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through E; becomes a delayed recurrent event as defined in chapter XIII, 
section 5. It should therefore be clear that Markov chains are but a 
special case of recurrent events; the only new feature is that. we are 
dealing with many recurrent events simultaneously. 

Each state E; is characterized by its recurrence time distribution { $Oh. 
Here fÍ is the probability that the first return to Ej occurs at time n. 
Starting from the p™, we can calculate the f$” using the obvious recur- 
rence relations ë 


ka r i GERR +) (Dss 
=p; IP =op -Pim -o 


(n) __ 1) (n= 1; 2 -2 -Dp 
IP = pP = Po? IP? =... Hp 


(5.1) 


which, of course, are only a special case of the basic relation XIII (3.1) 
for recurrent events. The sum 


ao 
(6.2) i 2d 
n=l 
is the probability that, starting from E;, the system ever returns to Ej. 
The state E; is persistent if f; = 1; in this case the mean recurrence time is 


CJ 


63) w= En 


n=l 
We shall call E; à null state if p; = œ. 
If the system starts at E;, the waiting time up to the first passage 
through E; has a distribution f where 
n=l 
(5.4) P=pi IP =P- EI p. 
»=l 
Again this equation is not specific to Markov chains but is valid for 
arbitrary delayed recurrent events. Of course, if Ej cannot be reached 
from E;, then f{ = 0 for all. In general, 


(5.5) fi = LIP 
n=1 


is the probability that, starting from E;, the system ever reaches E;. 


We can now summarize the basic facts proved in chapter XIII, 
sections 3 and 5, as follows: 


af They state that the probability of a first return to Æ; at time n equals the proba- 
bility of a return at time n minus the probability that the first return takes place 


at some time v= l, 2, ..., n—1 and is followed by a repeated return at time n- 
In the notation of XIII(3.1), we have pi? = up and ff” = fp. 
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G) A state Ej is transient if fi < 1. Necessary and sufficient for this 

ao o 
ww the condition that 2, pf <œ. In this case automatically® >> pf) < o 


n=l n=l 
Sor each i. j 
(ii) A state E; is a persistent null state if f; = 1, but the mean recur- 
rence time uj = œ. Necessary and sufficient for this is the condition that 
oO 
È pf = o but p? — 0. In this case® 


jj i 
n=l 


(5.6) 2P +0 as n> w 


for each i. 

Gii) The state Ej kas period t > 1 if pẹ? = 0 whenever n is not divisible 
by i and t is the smallest integer with this property (i.e., a return to E; is 
impossible except, perhaps, in t, 2t, 3t, ... steps). 

(iv) If E; is persistent and aperiodic (not periodic), then © 


(5.7) PP > mfi as n> 
and, in particular, 
(5.8) pp —> wy as n >o. 


(If E; is a null state, set pj? = 0.) f 
(v) If E; is persistent and has period t, then (5.8) is to be replaced by 


(5.9) Sad < tp! as n — o, 


Persistenti states which are neither periodic nor null states will be called 
ergodic. 

Examples. (a) Consider the matrix P’ of the example in section 4 
(omitting the dashes). The state H,, being absorbing, is persistent. 
From E, the system necessarily passes into E and from there back 
into E>. Therefore Hz and E; are persistent states, with period 2 and 
mean recurrence time 2. The states E4, Es, Ee form a closed subset, 


° This follows trivially from (5.4) but is really a special case of the theorem in 
chapter XIII, section 5. 

7 Unfortunately, no generally accepted terminology exists. In the first edition 
the persistent states were called recurrent, which causes confusion by obscuring the 
Parallelism between Markov chains and recurrent events. Kolmogorov calls 
transient states unessential, but new research has shown that the main interest, 
both theoretical and practical, centers on transient states. The term ergodic, 
being synonymous with “persistent, non-null, non-periodic,” is rather generally 
accepted, but “positive” state is one of the existing alternatives, and sometimes 
“ergodic” is equated to persistent. 
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and the transitions between them are regulated by the last matrix 
shown in (4.1). It is clear that these states are persistent and non- 
periodic. (We shall see later on that in a finite Markov chain no per- 
sistent null states are possible.) 

From E; a passage into one of these closed sets is possible, and then 
the system stays in that closed set forever. Therefore Ez is transient. 
From Ey the system passes into E7 and no return to Ey is possible; 
therefore Ey too is transient. Finally, starting from Eg, the system 
will sooner or later pass into Ey or Eg, never to return. Accordingly 
E;, Es, Ep are transient. 

(b) We recall that in an unrestricted random walk [example (2.¢)] 
all states are persistent if p = q and are transient otherwise [see exam- 


ple (8.d)]. 


It is not always easy to decide whether or not a given state is per- 
sistent, and the criterion"that pi? should diverge is usually too diffi- 
cult to apply. A better criterion is contained in the theorem of the 
next section. 

Let E; be a fixed persistent state and Ez some other state which can , 
be reached from it. Furthermore, let N be the length of the shortest 
possible path from E; to Er, and put pP =a>0. A return from Ep 
to E; must have positive probability, for otherwise the probability of 
the system’s not returning to E; would be at least a, and fj <1—a<1 
contrary to the assumption that E; is persistent. It follows that there 


exists an index M such that p{ = 8 > 0. Now for any n we have 
obviously 


(5.10) ppt +3) > pon = appi 
and 
(5.11) PETA) > pap = ap- pp. 


These relations imply that the sequences p$? and p{? have the same 
asymptotic behavior, and from this we can draw important conclu- 
sions. To begin with, E; was assumed persistent, and therefore the 
series Zp{ diverges. From (5.11) it follows that also Zp) diverges, 
so that Zi, must be persistent. If pf — 0, then also pf? — 0, and 
vice versa. Finally, suppose that F; has period t. Since a return to 
E; is possible in N + M steps, N + M must be a multiple of t. It 
he then from (5.10) and (5.11) that E; and EZ, must have the same 
period. 

We see thus that from a persistent state only persistent can be 
reached, and they are all of the same type: Either they are ‘il mal states, 
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or all ergodic, or all periodic non-null states with the same period. The 
closure C of a persistent state E; is an irreducible set, and its submatrix 
defines a Markov chain on it which can be treated independently of 
the rest. We have thus proved the important 


Theorem. In an irreducible Markov chain all states belong to the 
same class: they are all transient, all persistent null states, or all persistent 
non-null states. In every case they have the same period. M oreover, every 
state can be reached from every other state. 

In every chain the persistent states can, in a unique manner, be divided 
into closed sets Ci, C2, ... such that from any state of a given set C, all 
states of that set and no other can be reached. All states belonging to the 
same closed set C, are necessarily of the same class. 

In addition to the closed sets C, the chain will in general contain tran- 
stent states: from which states of the closed sets C, can be reached (but not 
vice versa). 


This theorem has the interesting 


Corollary. In a finite Markov chain there exist no null states, and it’ 
ts impossible that all states are transient. 


Proof. It suffices to consider irreducible chains. if all states were 
either transient or null states, we would have p% — 0 asn — œ for 
each fixed pair j, k. Each row of P” would tend to zero while the row 
sums equal unity. This is clearly impossible in the case of finitely 
many terms, and we conclude that in an irreducible chain there exist 
neither transient nor null states. 


It follows that after an appropriate renumbering of the states (such 
as was used in the example of section 4) the matrix P corresponding to 
a chain with, say, two closed sets C, and C2 and additional transient 
states can be written schematically in the form of a partitioned matrix 


Pisc 4.0 
(5.12) P=|0 P 0 
A'B C 


where P, and P3 are the matrices of transition probabilities within the 
two closed sets. The matrix P” is then of the same type with P1, Pe, 
C replaced by P1”, P2”, C” (and A and B by more complicated matrices 
to be studied in section 8). Note that P1, P2, and C are square matrices, 
but A and B may be rectangular matrices as in the example. 
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6. ERGODIC PROPERTIES OF IRREDUCIBLE CHAINS 


In this section we restrict the discussion to aperiodic chains; as with 
recurrent events in general, the modifications required for periodic 


chains are rather trite, but the formulations become unpleasantly in- 
volved. 


Definition. A probability distribution {v,} is called stationary if 

(6.1) is 2 VPij. 
j 

If the initial distribution {a+} happens to be stationary, then the 
absolute probabilities {a{”} are independent of the time n, that is, 
a = ap. The physical significance of stationarity becomes apparent 
if we imagine a large number of processes going on simultaneously. 
Let, for example, N particles perform independently the same type of 
random walk. At time n the expected number of particles in state Fx 
is Na. With a stationary distribution these expected numbers re- 
main constant, and we observe (if V is large so that the law of large 
numbers applies) a state of macroscopic equilibrium maintained by a 
large number of transitions in opposite directions. Most statistical 
equilibria in physics are of this kind; that is, they are due exclusively 
to the simultaneous observation of many independent particles. Typi- 
cal is the case of a symmetric random walk (or diffusion): if many par- 
ticles are observed, then, after a sufficiently long time, roughly half of 
them will be to the right, the other to the left of the origin. Neverthe- 
less, we know from the arc sine law of chapter III, section 5, that the 
majority of the particles individually will misbehave and spend a dispro- 
portionately large part of the time on the same side of the origin. 
Many protracted discussions and erroneous conclusions could be 
avoided by the realization that the notion of statistical equilibrium (or 
the steady state) does not say anything concerning the behavior of the 
individual particle. This should be borne in mind in connection with 


the next theorem which is frequently described as asserting a “tend~ 
ency toward equilibrium.” 


Theorem. An irreducible aperiodic Markov chain belongs to one of 
the following two classes: 
ae Either the states are all transient or all null states; in this case 


p% — 0 asn — œ for each pair j,k and there exists no stationary 
distribution. 


(b) Or else, all states are ergodic, that is 


(6.2) lim pp =u,>0 
nee 
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where ur is the reciprocal of the mean recurrence time of Ey. In this case 
{ux} is a stationary distribution and there exists no other stationary dis- 
tribution. 


A slight reformulation may explain the implications of this theorem. 
If (6.2) holds, then for an arbitrary initial distribution {az} 


(6.3) a = Dap? > u. 
i 


Therefore: If there exists a stationary distribution, it is necessarily unique 
and the distribution at time n tends to it irrespective of the initial distribu- 
tion. The only alternative to this situation is that pẹ? — 0. 


Proof. The preceding theorem assures us that (6.2) holds whenever 
the states are ergodic. To prove assertion (b) above we first note that 


(6.4) Duy < 1. 
This follows directly from the fact that for fixed j and n the quantities 
pie (k = 1, 2, ...) add to unity, so that ur + uz +... + uy <1 for 
every N. Now put n = 1 in (3.3) and let m — œ. The left side 
tends to uz, and the general term of the sum on the right side tends to 
U,Prk- Adding an arbitrary finite number of terms, we see that 


(6.5) ur > Do wpe. 


Summing these inequalities over all k, we obtain the finite quantity 
Zu; on each side. This shows that in (6.5) the inequality is impossible 
and therefore 


(6.6) Up = Duspjx. 


Putting ve = ug- (uj)! we see that {vz} is a stationary distribution 
and hence at least one such distribution exists. 

Let {vų} be any distribution satisfying equations (6.1). Multiplying 
(6.1) by p{? and adding over j we see by induction that for each n 
(6.7) v = Dv po. 

» 


Letting n —> œ we get 

(6.8) Op = (vy + Vo F... Yur = Up. 

This completes the proof of assertion (b). If the states are transient 
or null states and {vg} is a stationary distribution, then equations (6.7) 
hold and p® — 0, which is clearly impossible. Accordingly, a sta- 
tionary distribution can exist only in the ergodic case, and the proof - 
is completed. 4 
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Examples. (a) The Ehrenfest model. In example (2,f), the condi- 
tions (6.1) for a stationary distribution take on the form 


k—1 k+1 
u = (1 - ana + Uki (k = 1, ...,a—1) 
a a 

(6.9) 


Uy Ua—ı 
= —; Ug = —. 


a a 


It is easily verified that the solution is 
tion 


(6.10) u= (3) 2—, 


given by the binomial distribu- 


, each molecule h 


in the first container. This is a typical exam 


physical significance. 
Fo 


r large 
shows that, once the limiti i 


usly the solution v, = /a. Tt foll 
that, if a finite reducible aperiodi i mitts 
; periodic chain has a do 7 
P, then v, = 1/a (i.e., in the limit all stai Ae er ne 


am 
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or null states. To prove this assertion suppose that (6.2) holds. The 
matrix P being doubly stochastic, we have for each fixed k and arbi- 
trarily large N 


© N 
(6.11) 1=5 p > DP > Nu 
j=1 j=l 
and this clearly implies u, = 0 against the assumption. (This proof 
applies also in the periodic case.) 

(c) Recurrent events. In example (2.1) we have introduced a Markov 
chain associated with an arbitrary recurrent event &, and we proceed 
now to show that (as could be expected) the states of the chain are 
always of the same type as &. 

First consider the case of a transient §; that is, suppose jf <1. The 
chain of transitions E; > Eiz1 > Eize >- -> Ej bas probability 


I= sigi 1 Siga 1 — Sign i= te Se 
Ll=s, 1s 1 — Sign-1 Le, by 


The probability that the system will never enter Ep is thus seen to be 
positive, and therefore all states are transient. On the other hand, 
when f = 1, the left-hand term in (6.12) tends to zero; with probability 
one the system will sooner or later pass through Eo. It follows that Zo 
18 persistent, and since every state can be reached from Eo, the chain 
is irreducible. We see thus: If & is transient, so are all states of the 
chain; if & is persistent, then the chain is irreducible and all states are 
persistent. f 

It is clear that the chain and & have the same period, and we shall 
suppose that & is aperiodic and persistent. W> have to decide whether 
there exists a stationary distribution, that is, a probability distribution 
{v+} satisfying (6.1). In the present case (6.1) reduces to 


SS. n-i ae ee 


UR-1+ 
ino l — ŝi 1 — Sk—1 


There exists a unique solution of these equations, namely 


o 
(6.14) vp = (1 — Sk)Yo = Teo where = te 

n=k+1 
In order that Zv < © it is necessary and sufficient that Zrą < . But 
27, = Unf, equals the mean recurrence time [cf. XI(1.8)]. This shows 
that the states of the Markov chain are null states if the mean recurrence 
time is infinite, and they have finite mean recurrence times tf & has. 


360 MARKOV CHAINS [XV.6 


We have derived the asymptotic properties of Markov chains from 
similar properties of recurrent events. Now we have shown that each 
recurrent event may be described in terms of a particular Markov 


persistent null states. In principle this could 
the convergence of the series Zp), but in pr: 
be attacked directly, 


*7. PERIODIC CHAINS 
In thé preceding section we have excl 


uded the case of periodic chains, 
but this was done only to avoid obse: 


nization of the asymptotic behavior of p® in 
irreducible periodie chai i i 


ave the 
same period t. Consider any t j and E; of an irreducible 
chain with period ¢. Since every state can be reached from every 


same remainder, 


Accordingly, for fixed E; 
r (where 0 <» <¢— 3) 


steps, Choosing j = 1, we get a 
-1 Gy so that Ep 
+ nt. We order the G, 
Ts, 
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case v = į — 1); a two-step transition will lead to a state in G42 (from 
Gt_2 it leads to Go, from G;_, to G1), ete. Finally, a éstep transition 
leads necessarily to a state belonging to the same group.. This means 
that, in a Markov chain whose matrix of transition probabilities is P‘, 
each group G, forms a closed set. Since the original chain is irreducible, 
each state can be reached from every other. This implies that in the 
chain with transition probabilities P* each G, forms an irreducible closed 
set. We have thus the 


Theorem. In an irreducible periodic Markov chain the stats can be 
divided into t groups Go, ..., Gi, 80 that a one-step transiticn from a 
state of G, always leads to a state of Gr4i (to Go if v =t — 1). Tfu: 
consider the chain only at times t, 2t, 3t, ..., then we get a new shain 
whose matriz of transition probabilities is P*. In it each G, forms an 
irreducible closed set. 


Our theorem contains complete information concerning the asymp- 
lotic behavior of p. If all states are transient or null states, then 
PP — 0 for every pair j, k. Otherwise each state E; has a finite mean 
recurrence time ux. Suppose that Ej; belongs to G,. On G, we have an 
irreducible non-periodic Markov chain with transition probabilities p®, 
and hence there exist the limits 


(7.1) lim p@ = ur if E, is in G, 
ek tase 0 otherwise 


where uz is the reciprocal mean recurrence time of Ey in the new chain, 
one step of which corresponds to é steps of the original chain. Thus 


(7.2) : 


ur = — 
BE 


Using (3.2), we find from (7.1), 


(7.3) lim pty nue if Erisin Gy 
ae 0 otherwise. 


Similarly, pet — uz if Ex isin G49, ete. In other words, jor fixed 
Ej and Ex the sequence pi?) is asymptotically periodic; in it blocks of 
t — 1 consecutive zeros alternate with a positive element which converges 
to ur = t/u. 

By the theorem of section 6, the uz within each group G, add to unity. 


Since there are ¢ blocks, it follows from (7.2) that the sequence {1/y;} 
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represents a probability distribution. The argument of section 6 
shows directly that this distribution is siationary and that no other sta- 
tionary distributions exist. 
8. TRANSIENT STATES 

From a persistent state E; the system can pass only into a persistent 
E; in the closure of E;, and we have obtained complete information 
concerning the asymptotic behavior of p{?) in this case. 

If E; is transient and E; ergodic, then by (5.7) 


(8.1) PP > wp 


where p+ is the mean recurrence time of Ez and fjs the probability 
that, starting from Z;, the system will sooner or later enter E,. How- 
ever, Ez belongs to an irreducible subchain C, and from Ex; the system 
is bound to pass through each state of C. Therefore, for each fixed j 
the probability fjų is the same for all states of C. In other words, if C 
ts an irreducible subchain with ergodic states and E; is transient, then 
for each Er of C 


(8.2) PP > msj 


where x; is the probability that, starting from E;, the system will ever 
enter C. Needless to say that for null states the right-hand side in 
(8.2) must be replaced by 0, and that the case of periodic E, necessi- 
tates only the usual routine modification. 


To complete the picture of the asymptotic behavior of Markov 
chains, it remains to solve the following 


Three Problems. (a) Given a transient state E; and a persistent 
closed set C, find the protability z; that, starting from E;, the system will 
ever enter C (i.e., pass through a state of C). 

(b) Find the probability y; that the system will forever remain in the 
set of transient states. 

(c) Given an irreducible chain, decide wh 
persistent. 

It will be seen presently that, after a slight reformulation, problem 
(c) becomes a special case of (a). 

Let T be the set of all transient states and suppose that the system 
is initially in the transient state E;; let a” be the probability that at time 
n, and not sooner, the system reaches the closed set C. Then, 


ether its states are transient or 


2 
(8.3) ay = af” 


n=l 
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ts the probability that the system will ultimately reach C and stay in C. 
By analogy with the simple random walk we shall call 2; the probability 
of absorption in C. The difference 1 — z; accounts for the possibility 
of absorption in other closed sets and (in the case of some infinite 
chains) of an indefinite continuation in transient states. 

It is clear that 


(8.4) zP = D pu, 
c 


the summation extending over those k for which Er is contained in C. 
If the system reaches C at the (n+1)st step, then the first step must 
lead from Æ; to another transient state. It is therefore clear that 


(8.5) aft? =D ppa, 
> i 


the summation now extending over those v for which E, is transient. 
Equations (8.4) and (8.5) are recurrence relations which uniquely deter- 
mine the x. Adding (8.5) forn = 1, 2,3, ..., we find that the absorp- 
tion probabilities x; are solutions of the system of linear equations 


(8.6) - Va, =a. 
+ 


We have thus an answer to problem (a); the probability z; can be 
obtained constructively from (8.3)-(8.5), but it is preferable to char- 
acterize it as a solution of the system of linear equations (8.6). In 
this connection the problem of uniqueness arises, but it fortunately 
turns out to be a special case of problem (b). 

Let yf” be the probability that the system is at time n in a transient 
state. Obviously 


y= x Pin 


(8.7) 


1 
yt? = E pny, 
T 


the summations again extending over all v for which E, is transient. 
It follows from (8.7) that y{? < 1 and hence y < y, and generally 
yt) <y. Therefore a limit 


(8.8) y; = lim y™ 


n>a 


exists; y; is the probability of the system’s forever staying in transient 
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states. From (8.7) we have 


(8.9) y= x Dino. 


The probabilities y; are thus seen to satisfy equations (8.9), but this 
does not solve the main question, namely, whether or not yj = 0 for 
allj. Suppose that there exists a bounded solution of the system (8.9), 
say 


(8.10) z = ÈX piz BESS 


A comparison of (8.10) and (8.7) shows then that lz; < yf and hence 
by induction |z;| < y{” for all n. Tt follows that Y; = 0 for all j if, 
and only if, the system (8.10) has no non-zero solution. Finally, if 
the linear equations (8.6) had two distinct solutions, their difference 
would be a solution of the linear equations in (8.10). We have thus 


Theorem 1. The probabilities zj of problem (a) are a solut 
linear equations (8.6). This solution is unique except when there exists a 
state Ej such that, starting from E;, the syst 


Note: We have seen that the probabilities y; 
as the mazimal solution of (8.9) bounded by 
attaches to {z,}. 


Corollary. In a finite Markov chain the probability of the system’s 
staying forever in the transient states is zero. The probabilities Tj of pass- 
ing from a transient E; into a closed set C are determined as the unique 
solution of the linear equations (8.6). 


may be characterized 
1; a similar property 


Proof. We have to prove that the equations (8.9) admit of no solu- 


e the maximum of the finitely 
many y;. There is no loss of generality in ordering the states so that 
the y; appear in decreasing order, say that Yı 


=y =. mynM > 
> Ya+ı 2 Yay2 >.... From (8.9) we have then for 7 <a 
a 
(8.11) M= Dow = p+ Pity 
T vel »>a41 


and the equality sign can hold only if p;, = 0 for each y > a. In this 
case By, ..., Ea form a closed set, and this 


= 
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Examples. (a) Random walk with absorbing barriers [example (2.b)]. 
Take for C the absorbing state Eo. Then z = qand 2! = 0ifj > 1, 
The system (8.6) therefore reduces to 


zı — pig = 9, 
(8.12) Tj — Gj — p41 =0 (j= 2,3, ..., a—2), 
Za—ı — Qlo—2 = 0. 


This is the same as the system XIV (2.1)-(2.2) and the solution is given 
in XIV(2.4). 

(b) Sequential sampling [example (2.j)]. Again let C be the state Zo. 
Then z! = 7;, and the equations (8.6) reduce to XIV(8.2) (where us 
stands for the present ;; cf. also problem XIV, 4). 

(c) Genetics [example (2.k)]. Here each of the two states Eo and 
Ezy forms a closed set. Absorption in Eo and in Eoy signifies, respec- 
tively, that the population ultimately consists only of aa- or only of 
AA-individuals. For the absorption in Ey we have 2 = po = 
= (1 — j/2N)?", and hence (8.6) assumes the form 


2N-1 2, j Y j a ( j iy 
ay a= E ( E (: av) PSZV 
It is plausible that at a moment when the A- and a-genes are in the 
Proportion j:2N — j their survival chances should be in the same ratio. 
Tf this is true, the solution to (8.13) must be z; = 1 — j(2N)™. That 
these z; really satisfy (8.13) is easily verified upon recognizing in (8.13) 
the terms of the binomial distribution with mean j. 


Finally we give a solution to problem (e). 


Theorem 2. Let an irreducible chain have states Eo, Fy, . wat In 
order that the states be transient, it is necessary and sufficient that the sys- 
tem of equations 


(8.14) y: = D Piss i= 1, 2y ss 
F 


admits of a non-zero bounded solution. 


Proof. In the construction (8.7)-(8.8) of {y;} interpret T as the set 
of the states E,, Ez, ... (the complement of Eo). The proof applies 
without change to this case, and it is seen that the probability of stay- 
ing in T (i.e., of not entering Eo) is given by (8.8) and satisfies (8.9). 
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Examples. (d) Unrestricted random walk. Example (2.e) requires 
a trivial change of notations since the states are numbered from — o% 
to-+c, Itis clear, however, that cur criterion depends on the existence 
of solutions of the equations 
(8.15) Yi = Pina + Yi, Yo =0, i= +1,42,.... 


Clearly all y; can be calculated recursively from y; and y—ı. If p >q, 


(8.16) n= fi- ($) ju y-; = 0, Ges WD. ccs 


is the unique solution and is bounded. If p = q, the solution is 
yi = iyı and is unbounded. We have here a Markov chain derivation 
of the old result that the states are transient if p + q, persistent if 


(e) Consider the matrix 
@ P 0 0 0 
1 0 p 0 0 
Q2 0. P2 0 
0 gs 0 p 


oo 


which Tepresents a random walk on (0,) with variable transition 
probabilities. It plays an important role in the theory of birth-and- 


death processes to be discussed in chapter XVII. The equations (8.14) 
reduce to 


(8.17) YV = Pz W=Qiuitpyy, t=2,3,... 
and can be solved recursively since 
(8.18) Yi+ı — Yi = a 
Yi — Yi-ı Pi 
and hence 
(8.19) na pen eB. 
Pı P2 Pi 


Adding these equations, we see that a bounded solution exists if, and 


only if, ZL; < œ where L; = (qı --- qi)(pı --- p;)~1. Therefore, the 
states are transient if ZL; < © and persistent otherwise. 
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9. APPLICATION TO CARD SHUFFLING 


A deck of N cards numbered 1, 2, ..., N cen be arranged in N! dif- 
ferent orders, and each represents a possible state of the system. Every 
particular shuffling operation effects a transition from the existing 
state into some other state. For example, “cutting” will change the 
order (1,2, ..., N) into one of the N cyclically equivalent orders 
(r,7+1, ...,N,1,2,...,;7—1). The same operation applied to the 
inverse order (N, N—1, ..., 1) will produce (V—r+1, N—r, ...,1,N, 
N-i, ..., N—r+2). In other words, we conceive of each particu- 
lar shuffling operation as a transformation E; — Ey. If exactly the 
same operation is repeated, the system will pass (starting from the 
given state E;) through a well-defined succession of states, and after 
a finite number of steps the original order will be re-established. From 
then on the same succession of states will recur periodically. For most 
operations the period will be rather small, and in no case can all states 
be reached by this procedure.’ For example, a perfect “lacing” would 
change a deck of 2m cards from (1, ..., 2m) into (1, m+1, 2, m+2, ..., 
m, 2m). With six cards four applications of this operation will re-estab- 
lish the original order. With ten cards the initial order will reappear 
after six operations, so that repeated perfect lacing of a deck of ten 
cards can produce only six out of the 10! = 3,628,800 possible orders. 

In practice the player may wish to vary the operation, and at any 
rate accidental variations will be introduced by chance. We shall 
assume that we can account for the player’s habits and the influence 
of chance variations by assuming that every particular operation has 
certain probability (possibly zero). We need assume nothing about 
the numerical values of these probabilities but shall suppose that the 
player operates without regard to the past and does not know the order 
of the cards.” This implies that the successive operations correspond 
to independent trials with fixed probabilities; for the actual deck of 
cards we then have a Markov chain. 

We now show that the matrix P of transition probabilities is doubly 
Stochastic [example (6.b)]. In fact, if an operation changes a state 
(order of cards) E; to Ez, then there exists another state E, which it 
will change into E;. This means that the elements of the jth column 
ee 

*In the language of group theory this amounts to saying that the permutation 
gr soy is not cyclic and can therefore not be generated by a simple operation. 
as This assumption corresponds to the usual situation at bridge. It is easy to 

vise more complicated shuffling techniques in which the operations depend on 


oar operations and the final outcome is not a Markov chain [cf. example 
.e)). 
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of P are identical with the elements df the jth row, except that they 
appear in a different order. All column sums are therefore unity. 

It follows that no state can be transient. If the chain is irreducible 
and aperiodic, then in the limit all states become equally probable. In 
other words, any kind of shuffling will do, provided only that it pro- 
duces an irreducible and aperiodic chain. It is safe to assume that this 
is usually the case. Suppose, however, that the deck contains an even 
number of cards and the procedure consists in dividing them equally 
into two parts and shuffling them separately by any method. If the 
two parts are put together in their original order, then the Markov 
chain is reducible (since not every state can be reached from every 
other state). If the order of the two parts is inverted, the chain will 
have period 2. Thus both contingencies can arise in theory, but hardly 
in practice, since chance precludes perfect regularity. 

It is seen that continyed shuffling may reasonably be expected to 
produce perfect “randomness” and to eliminate all traces of the original 
order. It should be noted, however, that the number of operations 
required for this purpose is extremely large.1° 


10. THE GENERAL MARKOV PROCESS 


chastic Process," or in other wo 
variables ® (X> Xo 2); 


P random process” are On: 
practically all the theory of probability from coin tossing to harkai SEE 


In is used mostly whi i 
EA y when a time parameter 


“This formulation refers to 


an infinite product space, but i i 
concerned only with joint distri aa aT Malia 


butions of finite collections of the variables, 
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time. In chapter XVII we shall get a glimpse of more general stochas- 
tic processes in which the time parameter is permitted to vary continu- 
ously. The term “Markov process” is applied to a very large and im- 
portant class of stochastic processes (with both discrete and continuous 
time parameters). Even in the discrete case there exist more general 
Markov processes than the simple chains we have studied so far. It 
will, therefore, be useful to give a definition of the Markov property, 
to point out the special condition characterizing our Markov chains, 
and, finally, to give a few examples of non-Markovian processes. 

Conceptually, a Markov process is the probabilistic analogue of the 
Processes of classical mechanics, where the future development is com- 
pletely determined by the present state and is independent of the way 
in which the present state has developed. The processes of mechanics 
are in contrast to processes with aftereffect (or hereditary processes), 
such as occur in the theory of plasticity, where the whole past history 
of the system influences its future. In stochastic processes the future 
is never uniquely determined, but we have at least probability relations 
enabling us to make predictions. For the Markov chains studied in 
this chapter it is clear that probability relations relating to the future 
depend on the present state, but’not on the manner in which the pres- 
ent state has emerged from the past. In other words, if two independ- 
ent systems subject to the same transition probabilities happen to be 
in the same state, then all probabilities relating to their future develop- 
ments are identical. This is a rather vague description which is for- 
malized in the following 


Definition. A sequence of discrete-valued random variables is a 
Markov process af, for every finite collection of integers ny < ng <...< 
SM < n, the joint distribution of (K™, KX, ..., X), X”) is defined 
m such a way that the conditional probability of the relation X™ = x on 
the hypothesis X™ = ay, ...7X = x, is identical with the conditional 
Probability of X™ =x on the single hypothesis X™) = z,. Here 
Tis, ..., Tr, are arbitrary numbers for which the hypothesis has a posi- 
tive probability, 


Reduced to simpler terms, this definition states that, given the state 
Tr at time n,, no additional data concerning states of the system at 
Previous times can alter the (conditional) probability of the state z at 
a future time n. 

The Markov chains studied in this chapter are obviously Markov 
Processes, but they have the following additional property not implied 
by the definition. For the Markov chains studied in the preceding sec- 
tions the transition probabilities pj, = P(X") = k|X™ = j} are in- 
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dependent of m. The more general transition probabilities 
(10.1) PE = P(X = kx” = j) (n <n) 


then depend only on the difference n — m. We say in this case that 
the transition probabilities are stationary (or constant). For a general 
integral-valued Markov chain the right side in (10.1) depends on m 
andn. We shall denote it by p;.(m, n) so that p;.(n, n+1) is the one- 
step transition probability at time n. Instead of (1.1) we get now for 
the probability of the path (Jo; jı, ..., jn) the expression 


(10.2) a} Dini (0, 1) Phill, 2) «08 Pinih (nl, n). 
The proper generalization of (3.3) is obviously the identity 
(10.3) Pir(m, n) = DY pjo(m, r) pye(r, n) 


which is valid for all r with m <r <n. This identity follows directly 
from the definition of a Markov process and also from (10.2); it is 
called the Chapman-Kolmogorov equation. 

In the present chapter we have dealt mostly with the asymptotic 
behavior of the higher transition probabilities, and few of the estab- 
lished properties are common. to the most general discrete Markov 
process. We shall, therefore, not dwell on the general theory. 


Examples of Non-Markovian Procezses. (a) The Polya urn 
scheme [example V(2.c)]. Let X™ equal 1 or 0 according to whether 
the nth drawing results in a black or red ball. The sequence {KX} is 
not a Markov process. For example, 


P{X® = 1/x@ = 1) = (6+0)/b+r+o), 
but 


PERO = 1X9 = 1, XO = 1) = GE 2c)/(b + r + 2c), 
(Cf. problems V, 19-20.) On the other hand, if Y™ 


black balls in the urn at time n, then {Y™} 


chain with constant transition probabilities. 


(b) Higher sums. Let Yo, Yi, ... be mutually independent random 


variables, and put S, = Yo+.. -+ Yn. The difference Sn — Sm (with 
m < n) depends only on Wacciy ka +, Yn, and it is therefore easily seen 
that the sequence {Sn} is a Markov process, Now let us go one step 


further and define a new Sequence of random variables U, by Un = So + 
+S: +...+S, (which means that 


is the number of 
is an ordinary Markov 


Un = Yn + 2Y,_, + 8Yn2+...). 
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The sequence {Un} forms a stochastic process whose probability rela- 
tions can, in principle, be expressed in terms of the distributions of the 
Yz. The {Un} process is in general not of the Markov type, since there 
is no reason why, for example, P{U,, = 0/U,_; = a} should be the 
same as P{U, = 0|U,_1 = a, U,_2 = b}; the knowledge of U,_; and 
U,,_» permits better predictions than the sole knowledge of U,,_1. 

In the case of a continuous time parameter the preceding summations 
are replaced by integrations. In diffusion theory the Y,, play the role 
of accelerations; the S, are then velocities, and the U, positions. If 
only positions can be measured, we are compelled to study a non- 
Markovian process, even though it is indirectly defined in terms of a 
Markov process. 

(c) Moving averages. Again let {Yn} be a sequence of mutually 
independent random variables. Moving averages of order r are defined 
by K =: (Yt Vara boner Ynyr—1)/r. It is easily seen that the 
X™ are not a Markov process. Processes of this type are common in 
Many applications (cf. problem 26). 

(d) A trafic problem. For an empirical example of a non-Markovian 
Process R. Fürth * made extensive observations on the number of 
Pedestrians on a certain segment of a street. An idealized mathematical 
model of this process can be obtained in the following way. For 
simplicity we assume that all pedestrians have the same speed v; also, 
we consider only pedestrians moving in one direction. At time ¢ = 0 
we divide the positive z-axis into segments of fixed length ô, each of 
which may or may not contain a pedestrian. We suppose tiat the 
distribution of pedestrians in our segments is determined by a sequence 
of Bernoulli trials. In other words, we have a sequence of independent 
random variables Yx, each of which assumes the values 1 or 0 with 
Probabilities p and q, respectively. The segment (k — 1)5 <a < kô 
contains a pedestrian if Yp = 1. Let now the whole axis move with 
velocity v in the negative direction, and let us observe the number of 
Pedestrians in the fixed interval of length Mô, which at time t = 0 is 
Covered by the interval 0 < x < Nô of the moving z-axis. At time t 
this fixed interval is covered by the interval vt < x < vt + Nô of the 
Z-axis. Let observations be made at times né/v and let X™ be the 
number of pedestrians in our fixed interval observed at time n. Then 
XO 2 Y, 4 Yng +...+ Yn4n-—1, So that our process is, except for 
the factor 1/N, a moving average process. It is therefore non-Mar- 
es 
HR. Firth, Schwankungserscheinungen in der Physik, Sammlung Vieweg, 
Braunschweig, 1920, pp. 17ff. The original observations appeared in Physikalische 
Zeitschrift, vols. 19 (1918) and 20 (1919). 
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kovian. (Passing to the limit 6 — 0, we obtain a continuous model, 
in which a Poisson distribution takes over the role of the binomial 
istribution. 

a Be st of Markov processes (composite shuffling). There 
exist many technical devices (such as groups of selectors in telephone 
exchanges, counters, filters) whose action can be described as a super- 
position of two Markov processes with an output which is non-Markov- 
ian. A fair idea of such mechanisms may be obtained from the study 
of the following method of card shuffling. 

In addition to the target deck of N cards we have an equivalent 
auxiliary deck, and the usual shuffling technique is applied to this aux- 
iliary deck. If its cards appear in the order (Gy, itis a «5 ay), we per- 
mute the cards of the target deck so that the first, second, .. -, Nth 
cards are transferred to the places number @, 42, ..., ay. Thus the 
shuffling of the auxiliary deck indirectly determines the successive 
orderings of the target deck. The latter form a stochastic process which 
is not of the Markov type. To prove this, it suffices to show that the 
knowledge of two successive orderings of the target deck conveys in 
general more clues to the future than the sole knowledge of the last 
ordering. We show this in a simple special case. 

Let N = 4, and suppose that the auxiliary deck is initially in the 
order (2431). Suppose, furthermore, that the shuffling operation 
always consists of a true “cutting,” that is, the ordering (ay, az, az, a4) 


(a4, a1, a2, a3); we attribute to each of these three possibilities prob- 
ability 3. With these conventions the auxiliary deck will at any time 
be in one of the four orderings (2431), (4312), (3124), (1243). On the 
other hand, a little experimentation will show that the target deck will 
gradually pass through all 24 possible orderings and that each of them 


y ssible orderings of 
the auxiliary deck. This means that the ordering (1234) of the target 


of the four orderings (2431), (4312), (3124), (1243). Now the auxiliary 
deck can never remain in the same ordering, and hence the target deck 


3 ) and (1243), respectively, 
then at time n + 1 the state (1234) is impossible. Thus the knowledge 
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*11. MISCELLANY 


(a) Inverse Probabilities 


Although it is most natural to investigate the future development of 
a system, it is occasionally necessary to study its past. Consider a 
Markov chain with states E, and constant transition probabilities Pix, 
whose absolute probabilities at time n are af = Tap. The con- 
ditional probability that the system was at time m < n in state Ej, given 


that at time n it is in Ep, is (independently-of the states at times after n) 


(m) 
aj = 
(11.1) qus(n, m) = “oy pE ™, m <n. 
ke 


This formula makes sense only if a{ > 0; otherwise the conditional 
probability in question is not defined. If all af” are positive, then 
(11.1) defines a system of transition probabilities with all the properties 
required for a Markov process. In particular, the g:;(n, m) satisfy the 
Chapman-Kolmogorov identity (10.3) with the time direction reversed, 
namely, 


(11.2) guj(n, m) = D> qen, r) gv;(r, m) 


(m <r <n). The qr;(n, m) are called inverse probabilities,“ Consider, 
in particular, an irreducible chain with stationary probabilities {ux}. 
Then af? = uz for all n, and ug > 0 (ef. sections 6 and 7). In this 
case the one-step transitions q ;(n+1, n) are independent of n and 
teduce to 
(11.8) Qj = = oy 

Ur 
The matrix {grj} is stochastic, so that here the inverse probabilities 
define a Markov chain with constant transition probabilities. If 
lik = Pir, the original chain is called reversible; its probability relations 
are then symmetric in time. 


(b) The Central Limit Theorem 


__ The theory of recurrent events contains further information concern- 
mg Markov chains. Let Ep bea fixed persistent state whose recurrence 
time has finite variance ox” (this condition is always satisfied if the 
a 

* This section may be omitted at first reading. 

A Kolmogoroff, Zur Theorie der Markoffschen Ketten, Mathematische Annalen, 
vol, 112 (1935), pp. 155-160. 
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chain is finite). Let N, denote the number of passages up to time n 
of the system through Ez. Then we know from chapter XIII, section 
6, that the variable N, is asymptotically normally distributed. In the 
motions of the present chapter we have E(N,) = 1/4 = Uk; a way 
to calculate the variance in the case of finite chains will be indicated 
in the next chapter. In particular, as n — œ, the probability tends 


n 


to one that Nn — ul < «for every arbitrary e > 0. This is the weak 
n 


law of large numbers for the number of passages through Ez. Similarly, 
the strong law of large numbers and the law of the iterated logarithm 
hold and require no special proof. In the case of an infinite chain, the 
recurrence time of Ez need not have a finite variance, even if its mean 
is finite. However, the general limit theorems for recurrent events 
apply in this case. 


n 
equal x, if at time n the system is in state Ep. As usual, we put 


(c) Non-stochastic Matrices 


The theorems of this chapter describe the asymptotic behavior of 
the powers P” of an arbitrary stochastic matrix P, that is, of a matrix 
whose elements satisfy the conditions (1.2). It is easy to generalize 
these theorems to a more general class of Matrices. Let P be an arbitrary 
(finite or infinite) matrix with non-negative elements and denote its row 
sums by S; so that S; = Zipjx. We assume that the Sequence S; is bounded, 
that is, that there exists a constant M such that S; < M. Under these 
otic behavior of P” is still described by our theo- 
stochastic matrix. 


sW, Doeblin, Sur les propriétés asymptoti 


ques de mouvements régis par certains 
types de chaines simples, Thesis, Paris, 1937. 
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umn number zero whose elements are defined by poo = 1, por = Poz = 
=...= 0, and pj = 1 — S; for j > 1. The new matrix Q is stochas- 
tic, and its asymptotic behavior is given by our theorems. On the 
other hand, P” is the submatrix of the corner element ps of Q". In 
the general case the row sums S; may exceed unity, but we may replace 
the matrix P by the matrix P* whose elements are p;z/M. The row 
sums 5;* of P* satisfy the condition S;* < 1, and we are able to de- 
scribe the asymptotic behavior of the powers P*". However, the ma- 
trices P” and P** differ only by the factor M”, so that our theorems 
actually describe the asymptotic behavior of p in all cases. 

Matrices of the described type occur in the theory of generalized 
random walks with creation or destruction of masses. 


(d) Literature 


There exists a huge literature on finite Markov chains. A detailed 
account of the various methods of attack and references to earlier work 
will be found in the comprehensive treatise by M. Fréchet. An alge- 
braic treatment of finite chains will be described in the next chapter. 
The entire theory of finite chains çan be derived from Frobenius’ theory 
of matrices with positive elements. This method has been exploited 
in particular by V. Romanovsky. Unfortunately these methods do 
not carry over to the more interesting case of infinite chains, first con- 
sidered by A. Kolmogorov.” His work was continued by W. Doeblin !8 
and J. L. Doob. The latter derived the ergodic properties from gen- 
eral group theory. Recent papers by K. L. Chung ” investigate in 


particular transitions from one state to another when certain states 
re 


16 Recherches théoriques modernes sur le calcul des probabilités, vol. 2 (théorie des 
événements en chaine dans le cas d'un nombre fini d’états possibles), Paris, 1938. 
Another monograph on Markov chains is due to B. Hostinsky, Méthodes générales 
ai des probabilités, fasc. 52 of the Mémorial des sciences mathématiques, Paris, 

ds 

1 Anfangsgründe der Theorie der Markoffschen Ketten mit unendlich vielen 
möglichen Zuständen, Matematičeskii Sbornik, N.S., vol. 1 (1936), pp. 607-610. 
This Paper contains no proofs. A complete exposition was given only in Russian, 
in Bulletin de V Université d’Etat à Moscou, Sect. A, vol. 1 (1987), pp. 1-15. 

** Sur deux problèmes de M. Kolmogoroff concernant les chaines dénombrables, 
Bulletin Société Mathématique de France, vol. 66 (1939), pp. 1-11. 

* Topics in the theory of Markoff chains, and also Markoff chains—denumer- 
able case, Transactions American Mathematical Society, vol. 52 (1942), pp. 87-64, 
and vol. 58 (1945), pp. 455-473. 

* K. L. Chung, Contributions to the theory of Markov chains I, Journal of Re- 
Search, National Bureau of Standards, vol. 50 (1953), pp. 203-208, and II, Trans- 
actions American Mathematical Society, vol. 76 (1954), pp. 397-419. 
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are forbidden. This leads in turn to more elegant formulas for the 
ae C. Derman * that in an irreducible chain with 
null states the equations (6.1) for stationary distributions admit of a 
unique solution {v+} such that Zv = œ. The inversion formula (11.3) 
makes sense also for such solutions, and the modern theory pays in- 
creasing attention to similar uses of unbounded solutions.“ Certain 
very general classes of non-Markovian processes related to Markov 
chains are treated systematically by T. E. Harris.” 


12. PROBLEMS FOR SOLUTION 


1. In a sequence of Bernoulli trials we say that at time n the state E, is 
observed if the trials number n — 1 and n resulted in SS. Similarly E», Ez, E4 
stand for SF, FS, FF. Find the matrix P and all its powers. Generalize the 
scheme. 

2. Classify the states for the four chains whose matrices P have the rows 
given below. Find in each case P? and the asymptotic behavior of pif). 

(a) (0, 3, 3), (2, 0, 4), (4, 3, 0); 

(2) (0, 0, 0, 1), (0, 0, 0, 1), (4, , 0, 0), (0, 0, 1, 0); 

(c) (2,0, 3,0, 0), (4, 3, 4, 0, 0), (4, 0, 4, 0, 0), (0, 0, 0, 4, 4), (0, 0, 0, 4, 4); 
(4) (0, 2, 4, 0, 0, 0), (0,0, 0,3, 4, 4), (0, 0, 0, 3, 3, 4), (1, 0, 0, 0, 0, 0), 
(1,0, 0, 0,0, 0), (1, 0, 0, 0, 0, 0). 

3. We consider throws of a true die and agree to say that at time n the sys- 
tem is in state E; if j is the highest number appearing in the first n throws, 
Find the matrix P” and verify that formula (3.3) holds. 


4. In example (2.0) find the (absorption) probabilities z, and Yr that, start- 
ing from H;, the system will end in E or Es, respectively (k = 2, 3, 4, 6). 
(Do this problem from the basic definitions without referring to the formulas 
of section 8.) 


5. Treat example 1(5.b) as a Markov chain. Calculate the probability of 
winning for each player. 


6. The first row of P is {p1, p2,’...}. In the following rows Pj= 1, all 
other entries being zero. Discuss the character of the states and find the sta- 
tionary distribution, if any. 


7. The first column of P is {q0, gi, ...} and Diir = 1—Qg; fort = 0,1,.... 
Prove that the states are transient if, and only if, Zg; < oo. When are the 
states null states? Find the stationary distribution, if any. 

8. One reflecting barrier, Consider the random-walk matrix with Pkk+1 = P, 
Pkk- = q for k = 2, 3, ... and Pu = P, Pu =q. Prove that the states are 


*C. Derman, A solution to a set of fundamental equations in Markov chains, 
Proceedings American Mathematical Society, vol. 5 (1954), pp. 332-334, 

2W. Feller, Boundaries induced by positive matrices, Transactions American 
Mathematical Society, vol. 83 (1956), pp. 


19-54, 
»T. E. Harris, On chains of infinite order, Pacific Journal of Mathematics, vol. 
5 (1955), Supplement 1, pp. 707-724, 


> -r 
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transient if p > q, persistent null states if p = g, and ergodic if p < q. Find 
the stationary distribution. 

9. Two reflecting barriers. A chain with states 1, 2, ... a has a matrix 
whose first and last rows are (q, p, 0, ..., 0) and (0, ..., 0, g, p). In all other 
TOWS Pkk+1 = P, Prk—ı = Q. Find the stationary distribution. Can the chain 
be periodic? 

10. N black and N white balls are placed in two urns so that each urn con- 
tains N balls. The number of black balls in the first urn is the state of the 
system. At each step one ball is selected at random from each urn, and the 
two balls thus selected are interchanged. Find the pj. Show that in the 
limiting distribution the term ux equals the probability of getting exactly k 
black balls if N balls are selected at random out of a collection of N black and 
N white balls.2* 


11. A chain with states Zo, Zi, ... has transition probabilities 


i j . te 
pee ed: (C) pri 


v=0 (k — »)! 
where the terms in the sum should be replaced by zero if v > k. Show that 


k 
pR — eMe Qot, 

Note: This chain occurs in statistical mechanics % and can be interpreted as 
follows. The state of the system is defined by the number of particles in a 
certain volume of space. During each time interval of unit length each par- 
ticle has probability q to leave the volume, and the particles are stochastically 
independent. Moreover, new perticles may enter the volume, and the prob- 
ability of r entrants is given by the Poisson expression e~A*/r!. The stationary 
distribution is then a Poisson distribution with parameter \/g. 


12. Ehrenfest model. In example (2.f) let there initially be j molecules in 
the first container, and let KX‘) = 2k — a if at time n the system is in state 
k (so that K™ is the difference of the number of molecules in the two con- 
tainers), Let en = E(X). Prove that ens: = (a — 2)en/a, whence en = 
= (1 — 2/a)"(2j — a). (Note that en > Oasn > œ.) 

13. Treat the counter problem, example XIII(1.b), as a Markov chain. 


14. Plane random walk with reflecting barriers. Consider a symmetric ran- 
dom walk in a bounded region of the plane, The boundary is reflecting in 
the sense that, whenever in an unrestricted random walk the particle would 
leave the region, it is forced to return to the last position. Show that, if every 
point of the region can be reached from every other point, there exists a sta- 
tionary distribution and that u, = 1/a, where a is the number of positions in 
the region. 

15. Repeated averaging. Let {z1, 22, ..-} be a bounded sequence of num- 
bers and P the matrix of an ergodic chain. Prove that X pip; > Dujr;. 

2 
— 


oe problem goes back to Laplace; see Fréchet’s book (cited in footnote 16), 
P. 49. 


23S. Chandrasekhar, Stochastic problems in physics and astronomy, Reviews of 
Modern Physics, vol. 15 (1948), pp. 1-89, in particular p. 45. 
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Show that the repeated averaging procedure of chapter XIII, section 4, and 
of problem XIII, 5 is a special case. ; i 
16. In the theory of waiting lines we encounter the chain matrix 


Po Pi P2 P3 
Po Pi P2 Ps 
0 Po pi pe 
0 0 Pp 


where {p;} is a probability distribution. Using generating functions, discuss 
the character of the states. Find the generating function of the stationary 
distribution, if any. 

17. Waiting time to absorption. For transient E; let Y; be the time when 
the system for the first time passes into a persistent state. Assuming that the 
probability of staying forever in transient states is zero, prove that d; = E(Y;) 
is uniquely determined as the solution of the system of linear equations 


d; — X pid, =1, 


the summation extending over all » such that E, is transient. However, d, 
need not be finite. 


18. If the number of states is a < œ% and if Ep can be reached from E;, 
then it can be reached in a steps or less. 


19. Let the chain contain a states and let F; be persistent. There exists 
a number q < 1 such that for n = a the probability of the recurrence time of 
Ej; exceeding n is smaller than a”. (Hint: Use problem 18.) 

20. Ina finite chain Z; is transient if and only if there exists an E; such that 
Ek can be reached from E; but not Ej from Ez. (For infinite chains this is 
false, as shown by problem 7.) 


21. An irreducible chain for which one diagonal element p;; is Positive can- 
not be periodic, 


22. A finite irreducible chain is non-periodie if and only if there exists an n 
such that pj? > 0 for all j and k. 


_ 23. Ina chain with a states let (a1, ..., Za) be a solution of the system of 
linear equations z; = ZPiT». Prove: (a) the states E, for which z, > 0 form 
a closed (not necessarily irreducible) set ; (b) if E; and E; belong to the same 
irreducible set, then Tj = Tk. 

24. Continuation. If (z, .. +, Ta) is a solution of Ti = 8Epiz, with |s| = 1 
but s ~ 1, then there exists an integer ¿ > 1 such that s = 1. If the chain is 
irreducible, then the smallest integer of this kind is the period of the chain. 

25. Mean ergodic theorem.28 In an arbitrary chain let 


n 1 < 
Aap = = Rp: 


yel 


2° This theorem is a simple consequence of the 
However, it is much weaker and can therefore i H 
K. Yosida and 8. Kakutani, Markoff processes wi infini 
of possible states, Japanese Journal of Mathematics, vol. 16 (1939), pp. 47-55. 
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If E; and Er belong to the same irreducible closed set, then A‘? tends to a 
limit which is independent of j and equals the stationary probability ur, 
whenever the latter exists. If E; and Ep belong to different closed sets, then 
A$P = 0 for all n. If Ey is transient, then A{? — 0 for all j. 

26. Moving averages. Let {¥.} be a sequence of mutually independent 
random variables, each assuming the values +1 with probability 4. Put 
X™ = (Ya + Yn4:)/2. Find the transition probabilities 


Palm, n) = P{X™ = k|X™ = j}, 


where m < n and j, k = —1, 0, 1. Conclude that {X} is not a Markov 
process and that (10.3) does not hold. 


27. In a sequence of Bernoulli trials say that the state F, is observed at 
time n if the trials number n — 1 and n resulted in success; otherwise the sys- 
tem is in Æ». Find the n-step transition probabilities and discuss the non- 
Markovian character. 

Note: This process is obtained from the chain of problem 1 by lumping 
together three states. Such a grouping procedure can be applied to any Markov 
chain and destroys the Markovian character. Processes of this type are studied 
in the paper by Harris. 

28. Mixing of Markov chains. Given two Markov chains with the same 
number of states, and matrices P, and Ps. A new process is defined by an 
initial distribution and n-step transition probabilities 4P,” -+ P4". Discuss 
the non-Markovian character and the relation to the urn models of chapter V. 


CHAPTER XVI* 


Algebraic Treatment 
of Finite Markov Chains 


In this chapter we consider a Markov chain with finitely many 
states Zi, ..., Ea and a given matrix of transition probabilities p;x. 
Our main aim is to derive explicit formulas for the n-step transition 
probabilities p{. We shall not require the results of the preceding 
chapter, except the general concepts and notations of section 3, 

We shall make use of the method of generating functions and shall 
obtain the desired results from the partial fraction expansions of chap- 
ter XI, section 4. Our results can also be obtained directly from the 
theory of canonical decompositions of matrices ! (which in turn can be 
derived from our results). Moreover, for finite chains the ergodic 
properties proved in chapter XV follow from the results of the present, 
chapter. However, for simplicity, we shall slightly restrict the gen- 
erality and disregard exceptional cases which complicate the general 
theory and do not occur in practical examples. 

. The general method is outlined in section 1 and illustrated in sec- 
tions 2 and 3. In section 4 special attention is paid to transient states 
and absorption probabilities. In section 5 the theory is applied to find- 
ing the variances of the recurrence times of the states Ej. 


1. GENERAL THEORY 
For every fixed J, k we define a generating function 


(1.1) Pil) = DY Pr, 
n=1 


* This chapter treats a Special to 


pic and may be itted. 
1 See the treatise by Fréchet cite ay ee 


d in chapter XV, section 11, 
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Multiplying this equation by sp,; and adding over all j, we get 


(1.2) s De PojP5x(8) = Pri(S) — Prr- 


For every fixed k we have here a system of a non-homogeneous linear 
equations for the a unknowns P;(s), ..., Pax(s). Theoretically, this 
system can be solved by means of determinants or by successive elimi- 
nations of unknowns. We use only the fact that the determinant D(s) 
of the system is a polynomial of degree not exceeding a, and that the 
P,x(s) are rational functions of s with the common denominator D(s). 
We shall consider only the case where the equation D(s) = 0 has no 
multiple roots; this is a slight restriction of generality, but the theory 
will cover most cases of practical interest. 

Since the P,,,(s) are rational functions, the partial fraction erpanmon 
5t anter XI, section 4, shows that there exist coefficients PP, 
e® such that 


(a) 
(1.3) np = E a oe Hints 


where sı, so, ... are the roots of D(s) = 0. If the degree of D(s) is 
smaller than a, then (1.3) will contain fewer than a terms. It is also 
possible that for some particular values of v and k one or more roots s, 
are common to the numerator and denominator and nae cancel. We 
take care of such cases by letting the corresponding p{? be zero. 

We could calculate the roots s, and the coefficients af by the methods 
of chapter XI, but it is preferable to take advantage of certain par- 
ticular properties of Markov chains. Multiply equation (1.3) by pj, 
and sum over v = 1, 2, .... The result is 


aa aa fo? of 
; pet) = Èn {2 ket al 


If the left side is expressed by means of (1.3), we get an identity 
which can hold for all n only if the coefficients of s17”, ..., Sa” on both 
sides are equal. This means that for every fixed r we must have 


(1.5a) oe = > Pip, FSA, any Ge 
v=1 
In like manner we get, on multiplying (1.3) by pym and adding over all k, 


a 


(1.50) pË = s, >) pL Dim. 
k=l 
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The relations (1.5a) show that for k and r fixed the a quantities 


Fai: -, PË area solution of the system of a linear equations 
a 
(1.62) aP = s, D ppa? (G=1,...,a) 
val 


Similarly, relations (1.56) imply that for v and r fixed, the pQ, .. =, ph 
satisfy the a linear equations 


(1.68) Ye = s, D YP Dim (m=1,..., a), 
k=l 


For a better understanding let us replace 8, by an arbitrary s and 
study the two more general systems 


(1.72) t= 38D pia, G= l ...,34) 

and = 

(1.76) Um = 8 DO VePimn (m =1, Daisy Ge 
k=l 


eS Ba 


Pir(s) and define 
the roots 8, as those numbers (real or complex) for which the systems 
(1.7a) and (1.7b) admit of non-trivial solutions. The assumption that 
$+ iS a simple root means that for every fixed r the solutions Gee Ae) 


numerical factor, However, our starting point w. 
for k and r fixed, TNE TEA solution of (1.7a), while for y and 
fixed (o9, ..., Px) is a solution of (1.76). Since these sol 
determined up to a numerical factor, we must have 


(1.8) Pik = cay, 
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From (1.8) and (1.3) we have 


a 


(19) Pe = De ys,—. 
ral 
Therefore 
Py. a 
(1.92) PrO) = cy +-...f o 
s&s Sa — 8 


Using (1.62) we get from (1.1) ~ 


3 2 x) 
(1.10) 2, Pia? = af) Yes * = 
k=l n=l S's 


On the other hand, if we evaluate the left side using (1.92), we are 
lead to a sum of a fractions with denominators S$ — s. It follows 
that for » = r the numerators must vanish and 


a 
(1.11) l=¢ dry, 
vel 
and thus we have found c,. It is true that the solutions z” and yo 
are determined only up to a numerical factor. However, if we replace 
the zP by Az}, and the y}? by By’, then c, will be changed into 
¢,/AB and the quantity p% of (1.8) remains unchanged. 
Summarizing, we have the following procedure to calculate pp. 


Write down the two systems of linear equations (1.7a) and (1.7b). 
They have a common determinant and admit of non-trivial solutions only 
for values of s for which this determinant vanishes. We suppose that 
the roots sı, sə, ... (of which there are at most a) are simple; then for 
each r, the solutions (x, ..., x) and (yP, ..., yP) are determined up 
to an arbitrary multiplicative constant. Find these solutions and the con- 
stants c, from (1.11). Then p& is given by (1.9). 


For every fixed r the p% form a matrix which may be constructed in 
the following way. Form a multiplication table with the z} heading 
the rows and the yf heading the columns. Multiplying all a? elements 
of this square table by c,, we get the matrix p§. To construct the 
matrix (p) we have to divide all elements of pe by s,” and add the 
matrices thus obtained for r = 1, 2, .. -, a. Note that the roots s, 
may be simple even if there are fewer than a roots. 

The case of multiple roots requires certain changes but may be 
treated by similar methods. The case of greatest interest will be dis- 
cussed in section 4, 
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In algebra the reciprocals \, = 1/s, are called characteristic values (or eigen values 
or latent roots) of the matrix P. Zero is a possible characteristic value, but to it 
there corresponds no root s,. This explains why there may be fewer than a roots s, 
even though there are always a characteristic values. The use of s, rather than of 
their reciprocals is more convenient for the method of generating functions. More- 
over, it corresponds to the general usage in the theory of integral equations and is 
therefore more natural in probability theory. 

The value s = 1 always occurs among the s,, and to it there corresponds the solution 
(1,1, ...,1) of (17a). For allr we have |s,| > 1. Infact, a root s, with |s,| <1 
would lead to a divergent development in (1.3). If sı = 1 is the only root with 

[s| = 1, then pf? — cx}? yf”. It is not difficult to show that if there exist 
other roots with |s,| = 1, then they are necessarily tth roots of unity, where ¢ is 
an integer; in this case the chain has period t. For details the reader is referred to 
Fréchet’s treatise quoted in chapter XV, section 11. 

Often it is cumbersome or impossible to find all roots s,. However, it is clear 
that the asymptotic behavior of pf? is determined in first approximation by the 
Sr with |s,| = 1, and in second approximation by the roots s, with the next smallest 
absolute value. 

The final formula (1.9) can be written more elegantly in matrix notation. Let 
X be the column vector (or an a X 1 matrix) with elements zf? and let ¥ be 
the row vector (or a 1 X a matrix) with elements yf. Then XY) ig the aXa 
matrix with elements 2{”y{” and (1.9) takes on the form 


A 
(1.12) Pe Yoke XOYO where e- = YOO, 


rel 


The vectors X and YO) are called laten 


t vectors or eigen vectors, and c,— is their 
inner product. 
2. EXAMPLES 


(a) Consider first a chain with only two states. 
transition probabilities assumes the simple form 


t= 
ip ( Pp p ) 
a l~a 
where 0 <p <1 and 0<a<1. The equations (1.7a) reduce to 
s(1 — p)zı + sprz = x and sax, + s(1 — a)z2 = z2. Equating the 
two ratios x1/z», it is found that a solution exists only if either s = 1 
ors = 1/(1 — a — p). The solution corresponding to sı = 1 is (1 1). 
the solution corresponding to s% = 1/ (l —a— Teoh 
take the system (1.7b) which now reduces to s(1 
= DY + soy, = 
and spy; + s(1 — a)y2 = yo. We know that it can be ea me 
esponding solutions are (a, p) and 
(1, —1). From (1.11) we get C1 = c = 1/(æ + ) E a 
and (1.11) now enable us to write down explici wii io (Lh 
explicit formulas for the quan- 


The matrix of 


e 


*A direct proof is as follows, 


lef}, ..., J? ; 
en, Iza] (r fixed). The 


Let M be the 1 


argest i 
n from (1.6a) e 


MS |e,| Ep M = je,|M or 
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tities pP. The final result can be written in matrix form 


d a l — aœ — p)” — 
ras EREC 7) 
a+p\a p. a+ p —a «a 
(where factors common to all four elements have been taken out as 
factors to the matrices). Since |1 — æ — p| < 1, the second matrix 


tends to zero as n — , and the first matrix represents the limiting 
form of P”, 


(b) Let 
0001 
(2.1) p= 0001 
4400 
0010 


[this is the matrix of problem XV, ,2(b)]. The system (1.7a) reduces to 
_ S(t + 22) 
2 


Since a multiplicative constant remains arbitrary, we may put t= 1. 


Then x = $, z3 = 8, 23 = $, 24 = s, and therefore we must have 
8? =1. Nowif we put 


(2.2) ti = 9m, 2 = Sta, 25 1 T4 = Sig. 


Š 2 2 
(2.3) 0 = ei = cos = $ isin 


then the three roots of sè = 1 are sı = 1, s2 = 6, sz = 67. (Note that 
we have only three roots, even though there are four states.) The 
solutions z}? corresponding to the three roots are (1, 1, 1, 1), (6, 0, 67, 1), 
(6°, 6?, 0, 1). 

From system (1.7b) we get yı = sy3/2, y2 = sy3/2, y3 = sys, 
Y4 = s(y1 + y2). The three sets of solutions corresponding to sı = 1, 
S2 = 6, and sz = 6? are (1, 1, 2, 2), (0, 0, 2, 26), (6?, 67, 2, 20). There- 
fore from (1.11) cı = 4, c2 = 1/(662) = 0/6, cs = 1/(68) = 67/6. We 
are now able to express all p. For example, 


w LA OSE os 
oS ĄųÁ— 


pi? = z 
TH g?” +2 ae grt 
(n) —$_ 
(2.4) Pis 2 
a Le ere 
Ca 


3 


etc. The chain is obviously periodic with period 3. 
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(c) Let p +g = 1, and 


© 


0 

q 
PE 

(2.5) R 


os © 
on: 
ovr onn 


P 


[This matrix describes a cyclical random walk; see example XV(2.d).] 
The equations (1.7a) reduce to zı = (pra + gz4), t2 = s(gx1 + prs), 
T3 = 8(g%_ + PT4), t4 = s(pzı + qz). Suppose that p = q. From the 
first and the third equations we find z1 + 23 = s(z2 + 24), and from 
the remaining equations z2 + x, = s(x; + 23). Hence we have either 
$? = 1or2,+23=22+2,=0. The first alternative leads to the 
two roots 8; = 1,82 = —1. On the other hand, substituting z3 = —z,, 
24 = —Zq into the first two equations, we find s?(p — q}? = —1, which 
yields the remaining two roots sz and s4. Thus 


t i 
, == -——_,, 


(2.6) s3 =1, % = —1, lS 4 
I-P q=Ņ 


(where #? = —1). The corresponding solutions 2? contain an arbitrary 
factor, and we are free to put z}? = 1. Then the four sets of solutions 
are easily found to be (1,1,1,1), (—1,1, —1,1), (i, —1, —i, 1), 
(—1, —1, i, 1). The system (1.7b) reduces in our case to Yı = s(qye + 
+ pya), y2 = 8(pyr + ays), Y3 = s(pyz + qua), ya = 8(qyy + Pya). To 
the four roots (2.6) there correspond the solutions L 
(—1, 1, -1, 1), (—4, -1,4,1), @, -1, i1). For the constanta C, 
we find from (1.11) c1 = co = cg = cg = 4}. Using (1.3) and (1.8), we 
can now write an explicit formula for each sequence PP (n=1, 2 
3, ...). In the present case the solutions z}? and y}? are of the simple 
form (a, @”, a?, a*), where a is one of the four numbers 1, —1, i, or —i, 
This enables us to express the p% by the single formula 


(2.7) pR = H1 @ — pF} (1 + (1H), 


This formula is valid also for p = g = 4. 

It is seen that the term involving (q — p)” tends to zero, and that 
the other term has period 2. 

(d) General cyclical random walk [example XV(2.d)]. In the preced- 
ing example we were able to express the zP and yf? as powers of the 
four fourth roots of unity. This suggests trying a similar procedure 
for the general matrix of example XV(2.d). It is convenient to number 


| Te 


he 
pas 
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the states from 0 to a — 1. For brevity we put 
(2.8) a = ele, 


This is an ath root of unity, and all ath roots are represented by the 
sequence 1, 0, 0°, ..., 6°71. It is easily seen that the systems (1.7a) 
and (1.7b) are satisfied by the a sets of solutions 


(2.9) ay = 0, gP ge 
with r = 0, 1, 2, ..., a—1; they correspond to 
a—l =! 
(2.10) s= | oD: ao" 
vy=0 


From equations (1.11) and (2.9) we find c, = 1/a for all r, and thus 
finally 


1 e—l , G=} n 
(2.11) pp =- ere ( Z oe") ; 
Q r=0 v=0 
It is interesting to verify this formula for n = 1. The factor of Q% is 
a—l 
(2.12) eg, 
r=0 


This sum is zero except when j — k + v = 0 or a, in which case each 
term equals one. Hence p{) reduces to qx_; if k > j and to ga4x-; if 
k <j, and this is the given matrix (p;,). 

(e) The occupancy problem. Example XV(2.g) shows that the clas- 
sical occupancy problem can be treated by the method of Markoy 
chains. The system is in state j if there are j occupied and a — j 
empty cells. If this is the initial situation and n additional balls are 
placed at random, then p® is the probability that there will be k 
occupied and a — k empty cells (so that p = 0 if k < j). Forj =0 
this probability follows from formula II(11.7). We now derive a for- 
mula for pè, thus generalizing the result of chapter II. 

Since pj; = j/a and p;,;41 = (a — j)/a, it is easily seen that the sys- 
tem of equations (1.7a) reduces to 


(2.13) (a — sjt; = sla — §)x;41, jpo iga 


For s = 1 we get the solution z; = 1. It is clear that if s = 1 then 
Ta = 0, so that s = 1 is the only value of s for which all 2; are different 
from zero. If sis any other value for which (2.13) has á solution, 
then there must exist some index r <a such that Zr41 = 0 but 
=T, = 0; from (2.13) it then follows that sr = a. Thus the roots s, 
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for which (2.13) has solutions are s, = a/r with r = 1, 2, ..., a. The 
corresponding solutions of (2.13) are obtained successively, putting 
z9 =1,andj=0,1,.... We find 


(2.14) z = C) + () 


so that z}? = 0 when j > r. 


For s = s, the system (1.7b) reduces to 


(2.15) C -= DP = (a — j+ ufa 
and has the solution 
(2.16) yf = ii = J == 

` 27 


where, of course, y}? = Oifj <r. Since 2 = 0 forj > rand yP =0 
for j < r, we easily find from equation (1.11) that c, = (xy)! = 


= (°), and hence 
am oft = 2( YC) cor (0) 


On expressing the binomial coefficients in terms of factorials, this 
formula simplifies to 


om _(2-NS lot nr [k= 
(2.18) pi? = ( i = ( Ei( ^, 
a— v=0 v 


a 


with p? = 0 if k <j. 
` (Further examples are found in the following two sections.) 


3. RANDOM WALK WITH REFLECTING BARRIERS 


The application of Markov chains will now be illustrated by a 
complete discussion of a random walk with states 1, 2, ..., a and two 
reflecting barriers.* The rows number 2, 3, ..., a—1 of the matrix P 
are determined by py.x41 = p and Pkk—ı = q; the first and the last 
rows are defined by (q, p, 0, ...,0) (0,..., 0,9, p). The matrix of 


* Part of what follows is a repetition of the theory of chapter XIV. Our quadratic 
equation occurs there as (4.7); the quantities à(s) and 2(s) of the text were given 
in (4.8), and the general solution (3.3) appears in chapter XIV as (4.9). The two 


ete are related, but in many cases the computational details will differ radi- 
cally. 
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example XV(2.c) reduces to this when ô = 1. In the terminology of 
random walks, p? is the probability that the particle which starts 
from x = j is at time n at z = k. 

The equations (1.7a) take on the form 


s(qzı + pr) 
(3.1) zj = s(gzj—1 + pajy1) (j = 2,3, ..., a—1) 


Ta = 3(QFa—1 + Pa). 


Tı 


This system admits the solution zj = 1 corresponding to the root 
s= 1. To find ali other solutions we apply the method of particular 
solutions (which we have used for similar equations in chapter XIV, sec- 
tion 4). The middle equation in (3.1) is satisfied by z; = X provided 
that is a root of the quadratic equation à = gs + 2ps. The two 
roots of this equation are 

— 4pqs”)i 1 — (1 — 4pqs?)s 
(3.2) à(s) = TEU `2(8) = 1— 0 apoE 

2ps 2ps 


and the most general solution of the middle equation in (3. 1) is therefore 
(3.3) z; = A(s)d17(s) + B(s)r2*(s), 


where A(s) and B(s) are arbitrary. The first and the last equation in 
8.1) will be satisfied by (3.3) if and only if x) = zı and Ta = Taşı- 
This requires that A(s) and B(s) satisfy the conditions 


A(s){1 — à (s)} + B(s){1 — à2(8)} = 0 
A(s)d17(s) {1 — M(s)} + B(s)A22(s) {1 — do(s)} = 0. 
However, these two equations are compatible only if 


(3.5) A1°(8) = A2(s), 


(8.4) 


and we have to determine the values of s for which (3.5) is possible. 

From the definition (3.2) we have \j(s)A2(s) = g/p, and (3.5) implies 
that M (s)(p/g)? and X2(s)(p/g)# must be (2a)th roots of unity. These 
Toots can be written in the form 


(3.6) a E 
a a 


where 7 = —1 andr = 0, 1, 2, ...,2a—1. Thus all solutions of (3.5) 
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are among the roots of 


4 RE: ne 
à(s) = (2) eats o(s) = (2) entra 
P. P. 


To each value r we can find a root s,, namely 
(8.7) s, = {2(pq)? cos xr/a}—. 


The value r = a must be disregarded, since for it A(s) = do(s), 
A(s) = —B(s), so that it leads only to the trivial solution z; = 0. 
Tor = 0 there corresponds the solution x; = 1, which we have already 
considered. To r = 1, 2, ..., a—1 there correspond a — 1 distinct 
solutions; if we let r = a+1, a+2, ..., 2a—1, we get the same solu- 
tions with A,(s) and A2(s) interchanged. Thus we have found a distinct 
sets of solutions of (3.1), and we know that there can be no more. 

For s = s, with r = 1, 2, ..., a—1 we get from (3.4) 2A(s) = 1 — 
— Ag(s) and 2B(s) = —{1 — d4(s)}. (Remember that a multiplicative 
constant remains arbitrary.) Substituting into (8.3), we find the 
a — | sets of solutions 


uv ri 0+1) f= 
(3.8) P = (2) sin _ (2) pes tea 
Pp. a Pp. a 
(r = 1, 2, ..., a—1). To this we add the solution previously found 
(3.9) af = 1. 


It is easy to verify that (3.8) and (3.9) represent solutions of the given 
system (3.1). F 

i We have now to find solutions of the second system of linear equa- 
tions. In the present case (1.7b) takes on the form 


Yı = sgly: + v2), 
Gig He = spozi He) GE=2...,a—-1) 
Ya = SPlya—ı + Ya). 


The middle equation is the same as (3.1) with p and q interchanged 

and its general solution is therefore obtained from (3.3) simply by 
interchanging pand gq. The first and the last equations can be satis- 

fied if s = s,, and a simple calculation shows that for r = 1, 2 

a—1 the solution of (3.10) is ieee an 


ik Hk—1) 
61D) yf = (2) ae a naten, 
q a q. a 
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For s = 1 we find similarly 
k 
(3.12) 4 = (2) l 
q 


The next step consists in evaluating the coefficients c, in (1.11). 
The sum simplifies if sin? x7j/a is expressed in terms of the cosine of 
the double angle, and this in turn by means of complex exponentials. 
Then we have only to sum finite geometric series and find easily 


2p ar) 
(8.18) c, = — { 1 — 2(pq)! cos — (r= 1, 2, ssn aml): 
a a 


For r = 0 we get, 


(3.14) _ q @/) -1 


© = 2 — 
p (p/g)* — 1 
provided that p * q. If p =q = 7, then (3.18) remains valid, but 


(3.14) is to be replaced by co = 1/a. 
These formulas lead to the final result 


=4 k-1 ont 1-+4(n—J+k) gin tik) a—1l 
g. wis ae 


(3.15) pe = -= 
*  (p/a)* — 1 \g a ia 
where S, stands for 
j i _ k + qr(k =i 
cost Z {sin — (À sin mi =} [inZ — (À sin s 
a a a a p. a 


t ee laa 
ar 
1 — 2(pq)! cos = 


Asn — æ, the second term in (3.15) tends to zero, and we find again 
that p© tends to a stationary jon independent of 7. (This 
limiting distribution was derived by other methods in problem XV, 9.) 
Passing to the limit a > œ, we get the formula for a random walk 
with a single reflecting barrier; in the limit, the sum is replaced by an 
integral.’ 


ë For analogous formulas in the case of one reflecting and one absorbing barrier 
see M. Kac, Random walk and the theory of Brownian motion, American Mathe- 
matical Monthly, vol. 54 (1947), pp- 369-391. The definition of the reflecting bar- 
rier is there modified so that the particle may reach 0; whenever this occurs, the 
next step takes it to 1. The explicit formulas are then more complicated. Kuc 
also found formulas for p% in the Ehrenfest model [example XV(2,f)]. 
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4. TRANSIENT STATES; ABSORPTION PROBABILITIES 


The theorem of section 1 was derived under the assumption that the 
Toots 81, $2, ... are distinct. The presence of multiple roots does not 
Tequire essential modifications, but we shall discuss only a particular 
case of special importance. The root sı = 1 is multiple whenever the 
chain contains two or more closed subchains, and this is a frequent 
situation in problems connected with absorption probabilities. It is 
easy to adapt the method of section 1 to this case. For conciseness and 
clarity, we shall explain the procecure by means of examples which 
will reveal the main features of the general case. 


Examples. (a) Consider the matrix of transition probabilities 


330000 
$40000 
004200 
(4.1) P= ae 
0034400 
20924033 
ott 4 Ta 


It is clear that E, and E form a closed set (that is, 
possible to any of the remaining four states; 


into one of the two closed 


The matrix P has the form.of a partitioned matrix 


A o0 
(4.2) P=/0 BO 
U VT. 


where each letter stands for at 


wo-by-two matrix and each zero for a 
matrix with four zeros, 


For example, A has the rows (3, $) and 
of transition probabilities corresponding to 
two states Z, and E. This matrix can be 
2 "2e Powers A” can be obtained from example 
3- When the powers PeP esa ate calculated, 
the first two rows are in no way affected by the 
More precisely, P” has the form 

47 o o 

0 B o 

Un V, T 


it will be found that 
remaining four rows, 


(4.3) P = 
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where A”, B”, T” are the nth powers of A, B, and T, respectively, and 
can be calculated € by the method of section 1 (cf. example (2.a) where 
all calculations are performed). Instead of six equations with six un- 
knowns we are confronted only with systems of two equations with two 
unknowns each. 

It should be noted that the matrices U, and V, in (4.8) are not 
powers of U and V and cannot be obtained in the same simple way as 
À”, B”, and T”. However, in the calculation of P?, P?, ... the third 
and fourth columns never affect the remaining four columns. In other 
words, if in P” the rows and columns corresponding to #3 and Ey are 
deleted, we get the matrix 


A O 
4.4 
aa) 3 
which is the nth power of the corresponding submatrix in P, that is, of 
4200 
A 0 4 400 
(4.5) ( )= $3 
o£ i0 4% 4 
$ ¢ % 41: 


Therefore matrix (4.4) can be calculated by the method of section 1, 
which in the present case simplifies considerably. The matrix Vp, can 
be obtained in a similar way. 

Usually the explicit forms of Un and V, are of interest only inasmuch 
as they are connected with absorption probabilities. If the system 
starts from, say, Es, what is the probability à that it will eventually pass 
into the closed set formed by E, and Ep» (and not into the other closed 
set)? What is the probability An that this will occur exactly at the nth 
step? Clearly pf? + p$? is the probability that the considered event 
occurs at the nth step or before, that is, 


PSD + PP = M + Aa +... An 


Letting n — œ, we get à. A preferable way to calculate X,, is as fol- 
lows. The (n—1)st step must take the system to a state other than 
E, and Ee, that is, to either Zs or Eg (since from E; or E; no transi- 
tion to Æ, and E is possible). The nth step then takes the system to 


‘In T the rows do not add to unity so that T is not a stochastic matrix. How- 
ever, the method of section 1 applies without change, except that a = 1 is no 
longer a root (so that T” — 0). 
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E, or Ey. Hence 


(n—1) 
n = p55 


(ps1 + ps2) + P&S (per + pez) = 
= pg” + 4p”: 
It will be noted that An is completely determined by the elements of 
T”—!, and this matrix is easily calculated. In the present case 
PS = pe =43G%)"" and hence An = 3y(45,)"-2, 
(b) Brother-sister mating. As a second example we give a complete 


e matrix shows that the 
t which is clear from the 


(4.6) 


(4.7) T= 


i oap a apa 


The powers T” will now be calculated by the method of section 1. 
They represent the transition probabilities among transient states. 


The equations (1.7a) reduce to 
s(2z1 + 29) 8(22 + 2r + 2r3 4+ 4) 
t = ——__—, =e EA 
4 8 
(4.8) 
8(22 + 2x3) 
t3 = ——__~, T4 = 820, 
4 
This has a solution only if the d 


eterminant vanishes, and this condition 
leads to a fourth-degree equati 


onins. To simplify writing we put 


(4.9) al eat T we 


Then the four roots S, are 


(4.10) & = 2, & = 4, 8 = 4, 4 = —62, 


and the corresponding solutions (z9, , +, 22) of (4.8) are 


(4.11) (1,0, ~1,0), (1, “L1,-4), (1, 6,1, 0), (1, —62, 1, 02°). 
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The system of linear equations for yf’ is obtained by specialization 
from (1.7b), and the four sets of solutions are in proper order 


(4.12) (4,0, -1,0), (1, —1, 1, —4), 
(1, 61, 1, 01?/8),~ (1, —@2, 1, 02°/8). 


From (1.11) we find the four constants c) = 3, cz = $, c3 = 67/40, 
c4 = 6;°/40. From (1.8) we get the p; and finally (1.3) gives us 
pS? for all transient states, that is, for j, k = 2,3,4,6. For fixed j, k the 
sequence pp is the sum of four geometric series with ratios Bliss tyr Sas 

An absorption in Æ, exactly at the nth step is possible only if the 
(n—1)st step takes the system into either Ez or Ez, and the nth step 
into Z;. The probability for this is p$-/4 + p$-/16. Similarly, 
the probability of absorption at Es is p$-"/16 + p%-/4. Sum. 
ming over all n we get the probabilities that the system will eventually 
pass into and stay in E, and Es, respectively. ‘The actual calculation 
of these probabilities requires only the summation of four geometric 
series, 


5. APPLICATION TO RECURRENCE TIMES 


In problem XIII, 19 it is shown how the mean x and the variance 
a? of the recurrence time of a recurrent event & can be calculated in 
terms of the probabilities u, that & occurs at the nth trial. If & is 
not periodic, then 


«© a 2 

GI make aed D (u e: 5) Haer 
H n=0 

provided that o? is finite. 

If we identify & with a persistent state E;, then u, = p (and 
uo = 1). In a finite Markov chain all recurrence times have finite 
variance (cf. problem XV, 19), so that (5.1) applies. Suppose that E; 
is not periodic and that formula (1.3) applies. Then 8; = 1 and 
ls-|> 1 for r = 2, 3, ..., so that py — pf = 1/u;. To the term 
Un — 1/u of (5.1) there corresponds 


Ta 
(6.2) a ——= > offs. 
Hj r=2 


This formula is valid for n > 1; summing the geometric series with 
ratio s, `}, we find 


(5.3) > (op a =) ay a 


hj. rez 8 — 1 
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¢ introducing this into (5.1), we find that if E; is a non-periodic persistent 


state, then its mean recurrence time is given by u; = 1/p\?, and the variance 
of tts recurrence time is 


ot 
(5.4) of = uj — u’? + 2p? D _, 


provided, of course, that formula (1.3) is applicable and sı = 1. The 
case of periodic states and the occurrence of double roots require 
only obvious modifications. 


aw 


is 


CHAPTER XVII 


The Simplest Time-Dependent 


Stochastic Processes’ 


1. GENERAL ORIENTATION 


Random walks and Markov chains are stochastic processes ? where 
changes occur only at fixed times, say, t = 1, 2,3, .... On the other 
hand, in chapter VI, sections 5-6, we were concerned with phenomena 
such as telephone calls, radioactive disintegrations, and chromosome 
breakages, where changes may occur at any time. Obviously a com- 
plete description of such processes leads beyond the domain of discrete 
probabilities. To fix ideas, consider the incoming calls at a telephone 
exchange (or, rather, an idealized mathematical model of the actual 
process), Every instant ¢ corresponds to a trial, and the result of an 
experiment may be described in terms of a function X(t) giving the 
number of calls up to time ¢. If the first call occurs at time t, the 
Second at tc, etc., the function X(t) equals 0 for 0 < ¿< ùi, 1 for 
tı < t < te, 2 for tg < t < ts, ete. Conversely, every non-decreasing 
function X(t), assuming only the values 0, 1, 2, ..., represents a pos- 
sible development at our telephone exchange. In other words, a com- 
plete description of our conceptual experiment calls for a sample space 
whose points are functions X(t) (and not sequences as in the case of 
discrete trials). A compound event such as “seven calls within a 
minute on a certain day” is obviously the aggregate of those X(t) 
which satisfy the condition that for some point ¢ of a specified interval 
we have X(t + h) — X(t) > 7, where h represents the span of one 
minute, 

We cannot deal here with such complicated sample spaces and must 
defer the study of the more delicate aspects of the theory. Fortunately, 
ee 

* This chapter is almost independent of chapters X-XVI. 


* See footnote 11 of chapter XV. 
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certain interesting questions can be answered even with the simple 
means now at our disposal. 

If we limit the consideration to the number of calls X(t) within an 
arbitrary but fixed period of duration é, then X(¢) is a random variable 
of the familiar type, assuming the values 0, 1, 2, .... Let P(t) be 
the probability that X(é) = n. It is true that the distribution {P,(@} 
depends on a continuous parameter, but so do most distributions intro- 
duced in this book. 

The situation is best illustrated by the Poisson distribution 


(1.1) P,() = mwen. 


It was derived in chapter VI, section 5, as a limiting form of the bino- 
mial distribution; a more satisfactory derivation is contained in chap- 
ter XII, section 3. We shall not use the results of that chapter, but 
the situation analyzed there is so simple and so typical that a short 
summary may serve as the best introduction to the present chapter. 

Consider a stochastic process represented by an integral-valued ran- 
dom variable X(t) > 0. Intuitively we may interpret X(t), say, as the 
cumulative damage by lightning measured to the nearest dollar. We 
arrive at a particularly simple mathematical model if we introduce two 
postulates as follows. The increment X(t + s) — X(0) during the 
time interval from 0 to ¢ + s is the sum of the increments X(s) — X(0) 
and X(t + s) — X(s) corresponding to the subintervals from 0 to s and 
fromstot +s. We postulate, first, that these increments X(s) — X(0) 
and X(t + s) — X(s) are stochastically independent and, secondly, that 
the distribution of X(¢ + 8) — X(s) depends only on ¢ (i.e., only on the 
length of the interval, not on its position: this is the property of 
homogeneity in time). 

Let h,(t) be the probability that X(t + s) — X(s) assumes the 
value n (where n = 0, Wy By sade Analytically, the independence of 
X(t + s) — X(s) and X(s) — X(0) is expressed by 


(1.2) Ant +s) = D> h;(s) “ha (d). 

j=0 
It has been shown in chapter XII, section 3, that the only distribu- 
tion {ha(t)} with the property (1.2) is the compound Poisson distribu- 


tion; that is, X(t) has the distribution of a random variable 


(1.3) Sn with PIN = n} =e A 
n! 
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whereS, = Y; + Yo-+...+ Yn is the sum of n mutually independent 
variables with the common distribution {f;}, 7 = 0, 1, 2, .... In our 


example {f;} represents the probability distribution of the damage 
from an individual hit by lightning; then (1.3) states that the number 
of hits in a time interval of length ¢ obeys the Poisson distribution 
(1.1), and that the individual damages are independent random varia- 
bles. The variable (1.3) has the same probability distribution as the 
change X(t + s) — X(s) during an arbitrary interval of length ¢, and 
we see that this total change is the sum of a random number N of 
individual changes or jumps. The number N of changes has a Poisson 
distribution (1.1), and the individual jump has the probability distri- 
bution {f;}. In particular, the Poisson distribution (1.1) itself repre- 
sents the special case where all jumps are of unit length (that is, 
fi = 1, fo = fo =...= 0, the variables Y, assuming only the value 1). 

It will be observed that we have found a characterization of the 
simple and the compound Poisson distribution by means of intrinsic 
probabilistic properties. The Poisson distribution no longer appears 
as an approximation or a limiting form of other distributions but stands 
in its own right (or, we might sav, as the expression of a physical law). 
Its derivation is of a purely analytic character, the notion of a stochastic 
process and the random variable X(t) serving only to get a set of plau- 
sible postulates on the distribution {h,(é)}. For many applications, 
nothing beyond the knowledge of {h,(t)} is required. Theoretically, 
it should be shown that {hn(é)} really determines a family of random 
variables X(é) and all relevant probability relations such as the prob- 
ability of the event that X(d) will ever exceed at + b (this is the ruin 
problem of the collective risk theory in insurance). 

Questions of this type lead beyond the scope of this book. We shall 
be content to translate a physical description of a process into proper- 
ties required of the basic probabilities P,,(¢) and to consider {P,(t)} as 
a family of discrete probability distributions depending on ¢. 

This artificial limitation to discrete probabilities has unavoidable 


drawbacks. Consider, for example, the zero term in (1.1). We inter- 
pret 


(1.4) Poli) = e~™ 


as the probability that no call occurs within an observation period of 
duration t. This formulation suggests that Po(t) might be interpreted. 
as the probability that the waiting time (starting at an arbitrary mo- 
ment) up to the first call exceeds t. It can be shown that this interpre- 
tation is correct, but it will be noticed that it involves probabilities in 
a continuum. The operational meaning of our first formulation is as 
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follows: Make a series of “identical observations” with a fixed observa- 
tional period ¢. Each trial results in either “no call” (success) or “one 
or more calls” (failure). Then we have Bernoulli trials with the prob- 
ability of success e™™. With the second interpretation we are to wait 
until a call arrives. Every positive number is a possible waiting time, 
so that the sample space corresponding to each trial is the half-line 
t> 0. Formula (1.4) then represents a continuous probability distri- 
bution. 


2. THE POISSON PROCESS 


We begin by giving a new derivation of the Poisson distribution; it 
is by no means better than the derivation described above, but it lends 
itself more naturally to various generalizations which we propose to 
study. 

Take a system subject to instantaneous changes due to the occur- 
rence of random events such as splitting of physical particles, arrival 
of telephone calls, or breakage of a chromosome under harmful irradia- 
tion. All changes are assumed to be of the same kind, and we are con- 
cerned only with their total number. Each change is represented by a 
point on the time axis, so that we are studying certain random distri- 
butions of points on a line. 

The physical processes which we have in mind are characterized by 
the two properties, that they are homogeneous in time and that future 
changes are independent of past changes. By this we mean that the 
forces and influences which determine the process remain absolutely 
unchanged, so that the probability of any particular event is the same 
for all time intervals of length ¢, independent of -where this interval is 
situated and of the past history of the system.? 


We now translate this description into mathematical language. The 
process is to be described in terms of probabilities ¢ P„(¢) that exactly n 
changes occur during a time interval of length t. In particular, Po(t) 
is the probability of no change, and 1 — P(t) the probability of one 
or more changes. We shall assume that® ast — 0 


*In a telephone exchange incoming calls are more frequent during the busiest 
hour of the day than, say, between midnight and 1 a.m; the process is therefore 
not homogencous in time. However, for obvious reasons telephone engineers are 
concerned mainly with the “busy hour” of the day, and for that period the process 
can be considered homogeneous, Experience shows also that during the busy hour 
the incoming traffic follows the Poisson distribution with surprising accuracy. 
Similar considerations apply to automobile accidents, which are more frequent on 
Sundays, eto, 

‘For a non-homogeneous process we should have to introduce the probability 
P(t, 4) that n changes occur in the interval 4 < £ < tz. 

5 This condition can be dispensed with; see section 6. 
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— Polé 
1 ol) oy 
t 


(2.1) 


where À is a positive constant. Then for a smali interval of length h 
the probability of one or more changes is 1 — Po(h) = Ah + olh), 
where the term o(h) denotes a quantity which is of smaller order of 
magnitude than h. We now formulate our 


Postulates for the Poisson Process. Whatever the number of 
changes during (0, t), the probability that during (t, t+-h) a change occurs 
is Ah + o(h), and the probability that more than one change oceurs is olh). 


These conditions easily lead to a system of differential equations for 
Pa (t). Consider two contiguous intervals (0, é) and (t, +h), where h 
is small. If n > 1, then exactly n changes can occur in the interval 
(0, t+-h) in three mutually exclusive ways: (1) no change during 
(t, t+h) and n changes during (0, 2); (2) one change during (t, t+h) 
and n — 1 changes during (0, ¢); (3) z > 2 changes during (t, t+h) 
and n — z changes during (0, t). According to our hypotheses, the 
probability of the first contingency is P,,(t) times the probability of 
no change during (t, t+h) and this last is 1 — Ah — o(h). Similarly, 
the second contingency has probability P,_1(t)Ah + o(h), and the last 


has a probability of smaller order of magnitude than h. This means 
that 


(2.2) Pat + h) = P O — Nh) + Paa + olh) 
or 
(23) t= Fb = APC) + Pa +O. 


As h — 0, the last term tends to zero; hence the limit € of the left side 
exists and 


(2.4) P(t) = —APa(t) + APp_a(0) (n > 1) 


Forn = 0 the second and third contingencies mentioned above do not 
er 


f Since we restricted A to positive values, P’,(1) in (2.4) should be interpreted as 
a right-hand derivative. It is really an ordinary two-sided derivative. In fact, 
ute term o(h) in (2.2) does not depend on ¢ and therefore remains unchanged when 
t is replaced by £ — h. Thus (2.2) implies continuity, and (2.3) implies differen- 
tiability in the ordinary sense. This remark applies throughout the chapter and 
will not bo ropoated. 
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arise, and therefore (2.4) is to be replaced by the simpler equation 


(2.5) Polt + h) = Po(t)(1 — Ak) + o(h), 
which leads to 
(2.6) P'o(t) = —APo(2). 


From (2.6) and Po(0) = 1 we get Po(t) = e™. Substituting this 
Po(t) into (2.4) with n = 1, we get an ordinary differential equation 
for Pi(). Since P;(0) = 0, we find easily that P(t) = Me™, in 
agreement with the Poisson distribution (1.1). Proceeding in the same 
way, we find successively all terms of (1.1). 


3. THE PURE BIRTH PROCESS 


In the Poisson process the probability of a change during (t, t+h) 
is independent of the number of changes during (0,4). The simplest 
generalization consists of dropping this assumption. Assume instead 
that, when n changes occur during (0,2), the probability of a new 
change during (t, t+h) equals A,h plus terms of smaller order of mag- 
nitude than h; the single constant \ characterizing the process is re- 
placed by the sequence Xo, Ar, Ag, -- ee 

It is convenient to introduce a more flexible terminology. Instead 
of saying that n changes occur during (0, t), we shall say that the system 
ts in state En. A new change then becomes a transition En > F, 
In a pure birth process transitions from Z, are possible only to Æ 
Such a process is characterized by the following 


n+l. 
n+l. 


Postulates. If at time t the system is in state En (n = 0, L2 eee) 
then the probability that during (t, t+h) a transition to En41 occurs equals 
Anh + o(h); the probability of any other change is o(h). 

The salient feature of this assumption is that the time which the 
system spends in any particular state plays no role; there are sudden 
changes of state but no aging as long as the system remains within a 
single state. 

Again let P,(t) be the probability that at time ¢ the system is in 
state En. The functions P, (t) satisfy a system of differential equations 
which can be derived by the argument of the preceding section, with 
the only change that (2.2) is replaced by 


BI) Path) = PO — rh) + Pa (t)An_sh + olh). 


In this way we get the basic system of differential equations 


(3.2) PaO = MPa) + dn1Paa(l (>, 
P'o) = —M Polt). 
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We can calculate Po(t) first and then, by recursion, all P,(é). If the 
state of the system represents the number of changes during (0, ¢), 
then the initial state is Eo so that Po(0) = 1 and hence Po(t) = e™t, 
However, the system need not start from state Eo [see example (3.b)]. 
If at time zero the system is in E;, then we have 


(3.3) P;0) = 1, P,(0) = 0 for ni. 


These initial conditions uniquely determine the solution {Pi@} of 
(3.2). (In particular, Po(t) = Pit) =...= P:i1@ =0.) Explicit 
formulas for P(t) have been derived independently by many authors 
but are of no interest to us. It is easily verified that for arbitrarily 
prescribed \, the system {P.()} has all required properties, except 
that under certain conditions =P,(t) < 1. This phenomenon will be 
discussed in section 4. 


Examples. (a) Radioactive transmutations. A radioactive atom, 
Say uranium, may by emission of particles or y-rays change to an atom 
of a different kind. Each kind represents a possible state of the sys- 
tem, and as the process continues, we get a succession of transitions 
Eo > E, > Ez >...-> En. According to accepted physical theories, 
the probability of a transition E, > E,,41 remains unchanged as long 
as the atom is in state En, and this hypothesis is expressed by our 
Starting supposition. The differential equations (3.2) therefore describe 
the process (a fact well known to physicists). If Em is the terminal 
state from which no further transitions are possible, then Am = 0 and 
the system (3.2) terminates with n = m. (Forn > m we get automati- 
cally P,(t) = 0.) 

(b) The Yule process. Consider a population of members which can 
(by splitting or otherwise) give birth to new members but cannot die.. 
Assume that during any short time interval of length h each member has 
Probability Ah + o(h) to create a new one; the constant à determines 
the rate of increase of the population. If there is no interaction among 
the members and at time ¢ the population size is n, then the probability 
of an increase during (t, t-+h) is nàh + o(h). The probability P,,(¢) 
that the population numbers exactly n elements therefore satisfies (3.2) 
With A, = nà, that is, 


(3.4) P’,() = —ndPa(t) + (n — 1)APp_1(2) > 1). 


If č is the population size at time £ = 0, then the initial conditions (3.3) 
apply. It is easily verified that for n > 7 the solution is given by 


(3.5) P,,(2) = (" yi 3 ENA ta e Ales 
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and, of course, Pa (t) = 0 forn <i. This distribution. is a special case 
of the negative binomial Ganonin ane the definition VI(8.1) we 
may rewrite (3.5) as P,(t) = I(n—i; i, cme ert follows [cf. example 
IX(8.c)] that the population size at time t is the sum of 7 independent 
random variables each having the distribution obtained from (3.5) on 
replacing ¿ by 1. These 7 variables represent the progenies of the i 
original members of our population. À i , 

This type of process was first studied by Yule? in connection with 
the mathematical theory of evolution. The population consists of the 
species within a genus, and the creation of a new element is due to 
mutations. The assumption that each species has the same probability 
of throwing out a new species neglects the difference in species sizes. 
Since we have also neglected the possibility that a species may die out, 
formula (3.5) can be expected to give only a crude approximation. 
Furry ê used the same model to describe a process connected with cosmic 
rays, but again the approximation is rather crude. The differential 
equations (3.4) apply strictly to a population of particles which can 
split into exact replicas of themselves, provided, of course, that there 
is no interaction among particles, 


*4. DIVERGENT BIRTH PROCESSES 


The solution {P,(t)} of the infinite system of differential equations 
(3.2) subject to initial conditions (3.3) can be calculated inductively, 
starting from P;(t) =e, The distribution {P,()} is therefore 
uniquely determined. From the familiar formul 


as for solving linear 
differential equations it follows also that P, (t) > 


0. The only question 


ematical theory of evolution, based on the conclusions 
of Dr. J. C. Willis, F.R.S., Philosophical Transactions of the Royal Society, London, 

~87. Yule does not introduce the differential 
n(t) by a limiting process similar to the one used in 
I oisson process. Much more general, and more 
flexible, models of the same type were devised and applied to epidemics and popu- 


‘ous and highly interesting Paper by Lieutenant 
, Applications of mathemati 


1 i ge of high. d, 
Physical Reviews, vol. 52 (1937), p. 569. ee ae 
special topic and may be omitted, 
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left open is whether {P,,(é)} is an honest probability distribution, that 
is, whether or not 


(4.1) ZP, = 1 


for all t. We shall see that this is not always so: if the coefficients A, 
increase sufficiently fast, then it may happen that 


(4.2) =P,(t) <1. 


At first sight this possibility appears surprising and, perhaps, disturb- 
ing, but it finds a ready explanation. The left side in (4.2) may be 
interpreted as the probability that during time ¢ only a finite number 
of changes takes place. Accordingly, the difference between the two 
sides in (4.2) accounts for the possibility of infinitely many changes, 
or a sort of explosion. For a better understanding of this phenomenon 
let us compare our probabilistic model of growth with the familiar 
deterministic approach. 

The quantity àn in (3.2) could be called the average rate of growth 
at a time when the population size is n. For example, in the special 
case (3.4) we have Àn = 7A, so that the average rate of growth is pro- 
portional to the actual population size. If growth is not subject to 
chance fluctuations and has a rate of increase proportional to the in- 
stantaneous population size, then x(t) varies in accordance with the 
deterministic differential equation 


(4.3) oe ali) 


It follows that at time ¢ the population size is 
(4.4) a(t) = ie, 


where ¿ = (0) is the initial population size. The connection between 
(3.4) and (4.3) is not purely formal. It is readily seen that (4.4) actu- 
ally gives the expected value of the distribution (3.5), so that (4.3) 
describes the expected population size, whereas (3.4) takes account of 
chance fluctuations. 

Let us now consider a deterministic growth process where the rate 
of growth increases faster than the population size. To a rate of 
growth proportional to x?(t) there corresponds the differential equation 


da(t) 
dt 


(4.5) = d2z7(2) 
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whose solution is 

i 
1 — xt 


(4.6) z(t) = 


Note that x(¢) increases over all bounds as £ — 1 /di. In other words, 
the assumption that the rate of growth increases as the square of the 
population size implies an infinite growth within a finite time interval. 
Similarly, if in (3.4) the An increase too fast, there is a finite probability 
that infinitely many changes take place in a finite time interval. A 
precise answer about the conditions when such a divergent growth 
occurs is given by the 


Theorem. In order that (4.1) may hold for all t it is necessary and 
sufficient that the series 


1 
(4.7) » F, 
diverge. 
Proof. Letting 
(4.8) Sit) = Po(t) +... + Pid), 


we get from (3.2) 
(4.9) S'el) = —dxPr(t) 
and hence for k > i 


(4.10) 1- S(t) = Àk Po dr. 
0 


Since all terms in (4.8) are non-negative, the se 
fixed t—can only increase with k, and therefore th 
decreases monotonically with k. Call its limit a(t 


quence §;,(é)—for 
€ right side in (4.10) 
). Then for k >t 


t 
(4.11) Mf Puls) dr > nl?) 
and hence j 
f 1 1 1 
(4.12) Sa(r) dr > o(- a. =). 
: J 8:00) dr > wp A PS 


Because of (4.10) we have Sy 
at most ¢. If the series (4.7) 
in (4.12) tends to infinity, and 
for all t. In this case the righ; 


® < 1, so that the left side in (4.12) is 
diverges, the second factor on the right 
the inequality can hold only if p(t) = 0 
t side in (4.10) tends to zero ask — o, 


>> 
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and therefore S,(¢) — 1, so that (4.1) holds. Conversely, integrating 
(4.8) and using (4.10) we see that the left side of (4.12) is less than 
Not +A? +...+ Ant. If the series (4.7) converges, this expres- 
sion is bounded and hence it is impossible that S,(é) — 1 for all t. 


5. THE BIRTH AND DEATH PROCESS 


The pure birth process of section 3 provides a satisfactory description 
of radioactive transmutations, but it cannot serve as a realistic model 
for changes in the size of populations whose members can die (or drop 
out). This suggests generalizing the model by permitting transitions 
from the state Z, not only to the next higher state En+1ı but also to 
the next lower state Z,_1. (More general processes will be defined in 
section 9.) Accordingly we start from the following 


Postulates. The system changes only through transitions from states 
to their next neighbors (from En to En; or En—ı if n > 1, but from Eo 
to E, only). If at any time t the system is in state En, the probability 
that during (t, t+h) the transition En — Ey4, occurs equals Anh + olh), 
and the probability of En — Ens (if n > 1) equals unh + o(h). The 
probability that during (t, t+h) more than one change occurs is o(h). 


It is easy to adapt the method of section 2 to derive differential 
equations for the probabilities P,,(¢) of finding the system at time ¢ in 
state En. To calculate Pa(t + h), note that at time ¢ + h the system 
can be in state En only if one of the following conditions is satisfied: 
(1) At time ¢ the system is in Z, and during (t, t+h) no change occurs; 
(2) at time ¢ the system is in Z,_; and a. transition to En occurs; (3) at 
time ¢ the system is in En41ı and a transition to E, occurs; (4) during 
(t, t+h) two or more transitions occur. By assumption, the probability 
of the last event is o(h). The first three contingencies are mutually ex- 
clusive, so that their probabilities add. Therefore 


(5.1) Plt ae h) = P,(){1 — Aah — Enh} + 
F AnhPnalt) + unphPangalt) + olh). 


Transposing the term P,(t) and dividing the equation by h, we get on 
the left the difference ratio of P,(t). Letting h — 0, we get 


(5.2) P’,(t) = — An + hn) Palt) + An—1Pa—alt) + ong Pag ld). 


* By a regrettable oversight the following three lines were missing in the first 
printing of the first edition and part of the preceding argument was repeated instead. 
The error was corrected after a few months. (The present discussion is continued 
in section 10.) 
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This equation holds for n > 1. Forn = 0 in the same way 


(5.3) P'o(t) = —MPolt) + mP). 
If at time zero the system is in state E;, the initial conditions are 
(5.4) P;(0) = 1, P,(0) =0 for n i. 


The birth and death process is thus seen to depend on the infinite 
system of differential equations (5.2)-(5.3) together with the initial 
condition (5.4). The question of existence and of uniqueness of solu- 
tions is in this case by no means trivial. In a pure birth process the 
system (3.2) of differential equations was also infinite, but it had the 
form of recurrence relations; Po(?) was determined by the first equa- 
tion and P,(é) could be calculated from P,i(t). The new system 
(5.2) is not of this form, and all P,(¢) must be found simultaneously. 
We shall here (and elsewhere in this chapter) state properties of the 
solutions without proof.” 

For arbitrarily prescribed coefficients Xn > 0, tn > 0 there always exists 
a positive solution {P,(t)} of (5.2)-(5.4) such that =P, (t) < 1, If the 
coefficients are bounded (or. increase sufficiently slowly), this solution is 
unique and satisfies the regularity condition DP,(t) = 1. However, it 
is possible to choose the coefficients in such a way that =P,(t) < 1 
and that there exist infinitely many solutions. In the latter case we 
encounter a phenomenon analogous to that studied in the preceding 
section for the pure birth process. This situation is of considerable 
theoretical interest," but the reader may safely assume that in all 


N class of stochastic processes, Proceedings 
National Academy Sciences, USA (6) vol. 


4 (1955), pp. 387-391; forthcoming papers 
by the same authors and another by B. O. Ki oe : Š 


> oopman, both in the Transactions 
American Mathematical Society. 


Da ————— 
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cases of practical significance the conditions of uniqueness are satis- 
fied; in this case automatically 2P,(é) = 1 (see section 10). 

When Xo = 0 the transition Eo — E, is impossible. In the termi- 
nology of Markov chains Eo is an absorbing state from which no exit is 
possible; once the system is in Æo it stays there. From (5.3) it follows 
that in this case P’9(t) > 0, so that Po(t) increases monotonically. 
The limit Po() is the probability of ultimate absorption. 

More generally, it can be shown that the limits 


(5.5) fim Palt) = Pn 


exist and are independent of the initial conditions (5.4); they satisfy the 
system of linear equations obtained: from (5.2)-(5.3) on putting 
P’,(t) = 0. The relation (5.5) is usually interpreted as a “tendency 
toward the steady state condition” and this suggestive name has caused 
much confusion. It must be understood that, except when Zo is an 
absorbing state, the chance fluctuations continue forever unabated and 
(5.5) shows only that in the long run the influence of the initial condi- 
tion disappears. The remarks made in chapter XV, section 6, con- 
cerning the statistical equilibria apply here without change. 

The truth of (5.5) can be proved either from explicit formulas for 
the P,,(t) or from general ergodic theories. Intuitively the theorem 
becomes almost obvious by a comparison of our process with a simple 
Markov chain with transition probabilities 


An ey Bn p 
inhi h 
In this chain the only direct transitions are En > En+1 and En > En—1, 
and they have the same conditional probabilities as in our process; the 
difference between the chain and our process lies in the fact that, with 
the latter, changes can occur at arbitrary times, so that the number of 
transitions during time ¢ is a random variable. However, for large t 
this number is certain to be large, and hence it is plausible that for 
t — œ the probabilities P,(#) behave as the corresponding probabilities 
of the simple chain. À 

The principal field of applications of the birth and death process is 
to problems of waiting times, trunking, etc.; see sections 6 and 7. 


(5.6) Pant = 


Examples. (a) Linear growth. Suppose that a population consists 
of elements which can split or die. During any short time interval of 
length h the probability for any living element to split into two is 
Ah + o(h), whereas the corresponding probability of dying is ph + olh). 
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Here à and y» are two constants characteristic of the population. If 
there is no interaction among the elements, we are led to a birth and 
death process with An = nd, un = np. The basic differential equations 
take on the form 


P'o(t) = uPi(t), 
PO = — A+ v)nPr) +A — 1D) Ppa) + a(n + 1)Pa4i(t). 


Explicit solutions can be found * (cf, problems 9-11), but we shall 
not discuss this aspect. The limits (5.5) exist and satisfy (5.7) with 
P’,(t) = 0. From the first equation we find pı = 0, and we see by 
induction from the second equation that Pn = 0 for all n> 1. If 
Po = 1, we may say that the probability of ultimate extinction is 1. 
If po < 1, the relations p; = pz ...=0 imply that with probability 
1 — po the population increases over all bounds; ultimately the popu- 
lation must either die out or increase indefinitely. To find the prob- 
ability po of extinction we compare the process to the related Markov 
chain. In our case the transition probabilities (5.6) are independent 
of n, and we have therefore an ordinary random walk in which the 
steps to the right and left have probabilities p = A/A + u) and 
Gg=n/(A+ pn), respectively. The state Ey (or z = 0) is an absorbing 
barrier. We know from the classical ruin problem (see chapter XIV, 


(5.7) 


probability po = lim Po(t) of ultimate extinction is 1 if < B, and 
(u/A)’ if >u. (This is easily verified from the explicit solution; see 
problem 10.) 

As in many similar cases, the explicit solution of (5.7) is rather com- 
plicated, and it is desirable to calculate the mean and the variance 
of the distribution {P,()} directly from the differential equations. We 
have for the mean 


w 


(5.8) M(t) = > nP, (i). 


n=l 


We shall omit a formal proof that M (t) is finite and that the following 
formal operations are justified (again both points follow readily from 


* A systematic way consists in deriving a partial differential equation for the 
more general process where the coefficients 
and u in (5.7) are permitted to depend on time is discussed in detail in David G. 
Kendall, The generalized “birth and death” process, Annals of Mathematical Sta- 
tistics, vol. 19 (1948), pp. 1-15. See also the same author's Stochastic processes 
and population growth, Journal of the Royal Statistical Society, B, vol. 11 (1949), 
to take account of the age distribution 
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the solution given in problem 10). Multiplying the second equation 
in (5.7) by n and adding over n = 1, 2, ..., we find that the terms 


containing n? cancel, and we get 
(6.9) MO =AB@ — Paali) — u(n + Payal) = 
= A — a)M (8). 


This is a differential equation for M(t). At time é = 0 the population 
size is 7, and hence M(0) = 7. Therefore 


(5.10) M(t) = At, 


We see that the mean tends to 6 or infinity, according as A < p or 
\>y. The variance of {P,(¢)} can be calculated in a similar way 
(cf. problem 12). 

(b) Waiting lines for a single channel. In the simplest case of con- 
stant coefficients An = À, un = x the birth and death process reduces 
to a special case of the waiting line example (7.b) when a = 1. 


6. EXPONENTIAL HOLDING TIMES 


The principal field of applications of the pure birth and death proc- 
ess is connected with trunking in telephone engineering and various 
types of waiting lines for telephones, counters, or machines. This type 
of problem can be treated with various degrees of mathematical so- 
phistication. The method of the birth and death process offers the 
easiest approach, but this model is based on a mathematical simplifiea- 
tion known as the assumption of exponential holding times. We begin 
with a discussion of this basic assumption. 

For concreteness of language let us consider a telephone conversa- 
tion, and let us assume that its length is necessarily an integral number 
of seconds. We treat the length of the conversation as a random 
variable X and assume its probability distribution pn = P{X = n} 
known. ‘The telephone line then represents a physical system with 
two possible states, “busy” (Fo) and “free” (Z,). If at an arbitrary 
moment ¢ the line is busy, then the probability of a change in state 
during the next second depends on how long the conversation has been 
going on. In other words, the past has an influence on the future, 
and our process is therefore not a Markov process (see chapter XV, 
section 10). This circumstance is the source of most difficulties in 
more complicated problems. However, there exists a simple exceptional 
case discussed at length in chapter XIII, section 9. 

Imagine that the decision whether or not the conversation is to be 
continued is made each second at random by means of a skew coin. 
In other words, a sequence of Bernoulli trials with probability p of suc- 
cess is performed at a rate of one per second and continued until the 


412 STOCHASTIC PROCESSES [XVIL6 


first success. The conversation ends when this first success occurs. In 
this case the total length of the conversation, the “holding time,” has 
the geometric distribution Pn = p. If at any time ¢ the line is 
busy, the probability that it will remain busy for more than one sec- 
ond is g, and the probability of the transition Eo — E; at the next 
step is p. These probabilities are now independent of how long the 
line was busy. 

Without discretizing the time parameter we have to deal with con- 
tinuous random variables. The role of the geometric distribution for 


probability that the conversation lasts for longer than t time units is 
given by an exponential e™, We have encountered this “exponential 
holding time distribution” as the zero term in the Poisson distribution 


(1.1), that is, as the waiting time up to the occurrence of the first 
change, 


The 


It remains to characterize the so-called incoming traffic (a 
calls, machine breakdowns, etc.). We shall assu 


13 For conversations between ci 
three minutes, and the holding ti 
minutes, This is a systematic deviation from thi i 
Ta et koik e exponential law, and our theory 


pa 
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It is easy to verify the described property of exponentiabholding times. Denote 
by u(t) the probability that a conversation lasts for at least ¢ time units, The 
probability u(t + s) that a conversation starting at time O lasts beyond t + s 
equals the probability that it lasts longer than ¢ units multiplied by the conditional 
probability that a conversation lasts additional s units, given that its length ex- 
ceeds £. If the past duration has no influence, the last conditional probability must 
equal u(s); that is, we must have 


(6.1) u(t + s) = u(t) u(s). 
It remains to prove the 


Theorem. Let u(t) be defined for t > 0 and bounded in each finite interval. If 
u(t) satisfies (6.1), then either u(t) = 0 for all t, or u = e™ for some constant à. 


Proof. If u(t) does not vanish identically, there exists a point x such that u(z) > 0. 
Let à = —log u(z) and v(t) = &u(zt). Then 


(6.2) ot + s) = vo v(s), o(1) =1 


and we shall prove that v(t) = 1 for allt > 0. Clearly #3) = v(1) = 1, and gen- 
erally v"(1/n) = v(1) = 1 for each integer n > 0. Therefore v(1/n) = 1 and thence 
v(m/n) = v™(1/n) = 1 for each pair of integers m > 0, n >0. Hence v(r) = 1 for 
each rational r. Suppose now that v(r) = c #1. Then o(r—) = c~! and we may 
assume c > 1. In this case (Nr) = V(r) = c™ can be made arbitrarily large by 
choosing N sufficiently large. Now choose a rational r in the interval 


Nr -—1<r < Nr. 
Then 


(6.3) (NT — r) = (Nr — 7) ofr) = o(Nr) = c% 


which shows that there exist points a = Nr — r in the interval 0 <a <1 such 
that v(a) >c. This contradicts the assumption that u(t), and therefore v(t), 
are bounded in each finite interval. 


7. WAITING LINE AND SERVICING PROBLEMS 


(a) The simplest trunking problem.“ Suppose that infinitely many 
trunks or channels are available, and that the probability of a conver- 


“C, Palm, Intensititsschwankungen im Fernsprechverkehr, Ericsson Technics 
(Stockholm), no. 44 (1948), pp. 1-189, in particular p. 57. Waiting line and trunk- 
ing problems for telephone exchanges were studied long before the theory of sto- 
chastic processes was available and had a stimulating influence on the development 
of the theory. In particular, Palm’s impressive work over many years has proved 
useful to several authors. The earliest worker in the field was A. K. Erlang (1878- 
1929). See E. Brockmeyer, H. L. Halstrém, and Arne Jensen, The life and works 
of A. K. Erlang, Transactions of the Danish Academy Technical Sciences, No. 2, 
Copenhagen, 1948. Independently valuable pioneer werk has been done by T. C. 
Fry whose book, quoted in footnote 4 of chapter VI, did much for the develop- 
ment of engineering applications of probability. 
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It is, of course, assumed that the durations of the conversations are 
mutually independent. If n lines are busy, the probability that one 
of them will be freed within time } is then nuh + o(h). The probability 
that within this time two or more conversations terminate is obviously 
of the order of magnitude h? and therefore negligible. The probability 


(7.1) =A tn = Tp. 
The basic differential equations (5.2)-(5.3) take the form (n > 1) 
(7.2) Poli) = —APo(t) + uP, (8) 


equation for the generating function (cf. problem 13). We shall only 
determine the quantities pa = lim P,(t) of (5.5). They satisfy the 


equations _ 
(7.3) Po = up, 

O + nu)pa = Apn n+ Dupay. 
We find by induction that Pn = po(d/u)"/n!, and hence 


(7.4) Pa = ga AA, 


Thus, the limiting distribution is a Poisson distribution with parameter 
à/u. Itis independent of the initial state. 


It is easy to find the mean M() = InP, 
(7.2) by n and adding, we get, taking into account that the P,(i) add to unity, 
(7.5) Mi=) uMi). 

When the initial state is E;, then M(0) = i, and 


(t). Multiplying the nth equation of 


(7.6) M() = Aq = et) 4 emut, 
u 


Ast > a, we see that M(t) approaches the mean of the Poisson distribution found 
above, Incidentally, the reader may verify that in the special case 7 = 0 the 
P,(t) are given exactly by a Poisson distribution with mean M(t). 
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(b) Waiting lines for a finite number of channels.* We now modify 
the last example to obtain a more realistic model. The assumptions 
are the same, except that the number a of trunklines or channels is finite. 
If all channels are busy, each new call joins a waiting line and waits until 
a channel is freed. This means that all trunklines have a common wait- 
ing line. 

The word “trunk”, may be replaced by counter at a postoffice and 

“conversation” by service. We are actually treating the general waiting 
line problem for the case where a person has to wait only if all a chan- 
nels are busy. 

We say that the system is in state E,, if there are exactly n persons either 
being served or in the waiting line. Such a line exists only when n > a, 
and then there are n — a persons in it. 

As long as at least one channel is free, we are in exactly the same 
situation as in the preceding example. Boroa. if the system is in a 
state En with n > a, only a conversations are going on, and we have 
therefore un = ap, for n >a. The basic system of differential equa- 
tions is therefore given by (7.2) for n < a, but for n > a by 


(7.7) P’,(t) = —(A + ap)Pa(t) + Pai) + apPagi(). 


In the special case of a single channel (a = 1) these equations reduce 
to those of a birth and death process with coefficients independent of n. 
The limits pn of (5.5) exist; they satisfy (7.3) for n < a, and 


(7.8) (A + au)Pn = Pn + OvPn41 
forn >a. By recursion we find again that 
Xx n 
(7.9) Pn = Po Oe) , n<a 
n! 

d/u)” 

(7.10) Pn = í at Poy nza 
ala 


The series È (pn/po) converges only if 


A 
(7.11) =- <a. 

BL 
Hence, if (7.11) does not hold, a limiting distribution {p+} cannot exist. 
In this case pa = 0 for all n, which means that gradually the waiting line 
grows over all bounds. On the other hand, if (7.11) holds, then we can 


1 A. Kolmogoroff, Sur le problème d'attente, Recueil Mathématique [Sbornik], 
Vol. 38, 1931, pp. 101-106. 
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determine po so that the sum of the expressions (7.9) and (7.10) equals 
unity. From the explicit expressions for P,,(é), which we have not de- 
rived, however, it can be shown that the Pn thus obtained really repre- 
sent the limiting distribution of the P,(#). Table 1 gives a numerical 
illustration for a = 3, A/p = 2. 


TABLE 1 
LIMITING PROBABILITIES IN THE CASE OF a = 3 CHANNELS AND A/u =2 


n 0 1 2 3 4 5 6 7 
Lines busy 0 1 2 3 3 3 3 3 
People waiting 0 0 0 0 1 2 3 4 
Dn 0.1111 0.2222 0.2222 0.1481 0.0988 0.0658 0.0439 0.0293 


tation we begin with the simplest case and generalize it in the next 
example. The problem is as follows. 


say that the system is in state En if n machines are not working. For 
1 < n < m this means that one machine is bei 
are in the waiting line ; in the state Ep all 
repairman is idle, 

ere 


A Examples (e) and (d), including the numerical illustrations are taken from an 
article by C. Palm, The distribution of repairmen in servicing automatic machines 
(in Swedish), Industritidningen Norden, vol. 75 (1947), pp. 75-80 90-94, 119-123. 
Palm gives tableg and graphs for the most economical number of repatnan! 


— ~ 
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A transition Ep + En+1 is caused by a breakdown of one among the 
m — n working machines, whereas a transition En — E,_ occurs if 
the machine being serviced reverts to the working state. Hence we 
have a birth and death process with coefficients 
(7.12) àn = (m —n)d, Ho = 0, Mt = k =... = im =p 


and the basic differential equations (5.2) and (5.3) become (1 San< 
<m- 1): 


Pro(t) = —mdPo(t) + uPi(d), 

P'a) = —{(m — nA + u) Pali) + (mm — n + DAPA + 
+ Pai), 

Pali) = —uPm(t) + Pm (i). 


(7.13) 


This is a finite system of differential equations and can be solved by 
ordinary methods. The limits (5.5) exist and satisfy the equations 


MAPo = KP, 
(7.14) {(m—n)A + u}pa = (m—n+ 1I)APa—1 + HPa 4a, 
Pm = \Pm—1- 


It follows easily that the recursion formula 


(7.15) (m — 2)\Pn = MPa 

holds. Substituting successively n = m—I1,m—2, ..., 1, 0, we get 
71 = (*) 

(7. 6) Pm—k = ZIN Pm. 


The remaining unknown constant Pm can be obtained from the condi- 
tion that the p; add to unity: 


ct) ma fret (et 2 
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Formula (7.16) is well known among trunking engineers as Erlang’s 

loss formula. Typical numerical values are exhibited in table 2. 
TABLE 2 


PROBABILITIES Pn FOR THE CASE A/p = 0.1, m = 6 
(Ertane’s Loss Formuna) 


Machines in 

n Waiting Line Pn 

0 0 0.4845 
1 0 . 2907 
2 1 ~ 1454 
3 2 .0582 
4 3 .0175 
5 4 -0035 
6 5 -0003 


The probability Po may be interpreted as the probability of the re- 
pairman’s being idle (in the example of table 2 he should be idle about 
half the time). The expected number of machines in the waiting line is 


m 


(7.17) w= 2 (k — 1)p = D0 kp: — (1 — po). 


k=l 


This quantity can be calculated by adding the relations (7.15) for 
n=0,1,...,m. Using the fact that the Pn add to unity, we get 


md — XW — (1 — = - 
i (1 — po) = n(1 — po) 


(7.18) ee zty ee 


a Signifies that r machines are being se 


in the waiting line We can use th 
> e setup of th i 
except that (7.12) is obviously to be feclated by ii aati: 


rviced and n — r machines are 
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Xo = md, Ho = 0, 

(7.19) An = (m= n), pn = rH A<nr<r?), 
An =(m— n), on = TH (r<n<m). 


We shall not write down the basic system of differential equations but 
only the equations for the limiting probabilities Pn- They are 


MXPo = Hpi, 
(7.20) {(m — n)A + nu} Dn = (m — n + IApaa + (n + Dupngs 
(l<n<n), 
{(m — n)d + rH} Pn = (m — n + 1)Apaa + ruPaga 
(r <n <m). 


From the first equation we get the ratio of pı/po. From the second 
equation we get by induction for n < r 


(7.21) (n+ 1)upn4r = (m ae N)Pnj 
finally, for n > r we get from the last equation in (7.20) 
(7.22) THPny1 = (M — 7n)dApn. 


These equations permit calculating successively the ratios pn/po. 
Finally, po follows from the condition Ep, = 1. The values in table 3 
are obtained in this way. 


TABLE 3 


PROBABILITIES p, FOR THE Case A/p = 0.1, m = 20, r = 3 


Machines Machines Repairmen 

Serviced Waiting Idle Pn 

3 0.13625 
- 27250 
- 25888 
- 15533 
- 08802 
-04694 
-02347 
-01095 
-00475 
-00190 
-00070 
-00023 
-00007 


VE Soms HmanRwnro ® 
WWWWHWWWWWWNHO 
OWIMARWHHOOCOS 
Sooo ooCOOCCOMN 
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A comparison of tables 2 and 3 reveals surprising facts. Note that 
both tables refer to the same machines (A/e = 0.1), but in the second 
case we havem = 20 machines and r = 3 repairmen. The number of 
machines per repairman has increased from 6 to 62, but at the same 
time, the machines are serviced more efficiently. Let us define a 
coefficient of loss for machines by 


(7.28) w _ average number of machines in waiting line 


number of machines 
and a coefficient of loss for repairmen by 


(7.24) p _ average number of repairmen idle 


number of repairmen 


For practical purposes we may identify the probabilities P,(t) with 
their limits p,. In table 3 we have then w = Ps + 2ps + 3ps +...4+ 
+ 17pao and p = 3p) + 2p; + po. Table 4 proves conclusively that 


TABLE 4 


Comparison oF EFFICIENCIES OF Two Systems Discussep IN 


EXAMPLES (c) AND (d) 


I II 
Number of machines 6 20 
Number of repairmen 1 3 
Machines per repairman 6 6% 
Coefficient of loss for repairmen 0.4845 0.4042 
Coefficient of loss for machines 0.0549 0.03604 


for our particular machines for 
twenty machines are ever so muc 


(1946), pp. 630-632, rican Institute of Electrical Engineers, vol. 65 
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We say that the system is in state E,, if n welders are using current. 
Thus we have only finitely many states Eo, ..., Ea. 

Tf the system is in state En, then a — n welders are not using current 
and the probability for a new call for current within time h is 
(a — n)dh + o(h); on the other hand, the probability that one of the n 
welders ceases using current is nuh + 0(h). Hence we have a birth 
and death process with 


(7.25) An = (a —n)A, Hn = Th, O<n<a. 
The basic differential equations become (1 < n < a — 1) 
P'o(t) = —adPo(t) + aP), 
(7.26) P’n(t) = — {nu + (a — n)A} Palt) + (n + U)ePagi(t) + 
+ @—n+1)rPrild, 
P’,(t) = —apPa(t) + Pol). 


It is easily verified that the limiting probabilities are given by the 
binomial distribution 


eam ei Ga) 


a result which could have been anticipated on intuitive grounds. 


8. THE BACKWARD (RETROSPECTIVE) EQUATIONS 


In the preceding sections we were studying the probabilities P,(t) 
of finding the system at time ¢ in state Z,. This notation is convenient 
but misleading, inasmuch as it omits mentioning the initial state E; 
of the system at time zero. For theoretical purposes it is therefore 
more natural to introduce the notation P;,(t); this is the probability 
that the system is at time t in state En, given that at time zero it was in E;. 
The P;,,(t) will be called transition probabilities. 

It must be emphasized that we have been studying these transition 
probabilities all along and that nothing is changed but notation. When 
the initial state is known to be E;, then {P;,(#)} is the absolute prob- 
ability distribution at time t. When at time zero we have only a prob- 
ability distribution {gs} for the initial state, then the probability of En 
at time ¢ is 


(8.1) Qai) = za QP inl). 


In the case of the pure birth process and of the birth and death proc- 
ess, we found that for an arbitrary fixed i the transition probabilities 
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Pin(t) satisfy the basic differential equations (3.2) and (5.2). The sub- 
script ¿ appears only in the initial conditions, which should now be 
written 


Ane 1 for n= 
(8.2) in(0) = 0 otherwise. 


These basic differential equations were derived by prolonging the 
time interval (0, t) to (0,t+h) and considering the possible changes 
during the short time (t, +h). We could as well have prolonged the 
interval (0, t) in the direction of the past and considered the changes 
during (—h,0). In this way we get a new system of differential equa- 
tions in which n (instead of 7) remains fixed. 

Consider first the case of a pure birth process and let us neglect 
events whose probability tends to zero faster than h. If the system 
passed from E; (i > 0) at time —h to En at time t, then at time 0 it 
was with probability 1 — o(h) either at E; or at E;4;. By the method 
of sections 2 and 3 we conclude that 


(8.3) Pin(t +h) = Pin(t)(1 — Ash) + Pigi nlth + o(h). 
Hence for i > 0 the new basic system now takes the form 
(8.4) P'in(t) = —d;Pin(t) + NPiz1.n(d). 


These equations are called the backward equations, and, for distinction, 
equations (3.2) are called the forward equations. The initial conditions 
are (8.2). (Intuitively one should expect that 


(8.5) Pin(t) = 0 ifn <i, 


but pathological exceptions exist; see section 10). 


In the case of the birth and death process, if the system is at time 
—h in E;, then at time zero it should be in E41, E;, or E;_;, and the 
same argument leads to the backward equations 


(8.6) Pin) = -Ort ui)Pi a(t) + MP isin) + mP: nlt). 
These equations correspond to (5.2). 


It should be clear that the forward and b i 
independent of each other ; the pee 


POA N ee eC SC G 
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Example. The Poisson process. In section 2 we have interpreted 
the Poisson expression (1.1) as the probability that exactly n calls 
arrive during any time interval of length ¢. Let us say that at time ¢ 
the system is in state Z, if exactly n calls arrive within the time interval 
from 0 to ż¿. A transition from Æ; at i to En at to means that n — i 
calls arrived during (tı, t2). This is possible only if n > 7, and hence 
we have for the transition probabilities of the Poisson process 


Aë n—i 
Pal) =e OO it nd, 
(n — i)! 
(8.7) 
P;,(0) = 0 if n<i. 
The forward and backward equations are, respectively, 
(8.8) P'in(t) = —APin(t) + Pin (t) 
and 
(8.9) P'in(t) = —dPin) + AP i 41.20, 


and it is easily verified that (8.7) is a solution of both systems and 
satisfies the initial condition (8.2). 


9. GENERALIZATION; THE KOLMOGOROV EQUATIONS 


So far the theory has been restricted to processes in which direct 
transitions from a state En are possible only to the neighboring states 
E,,4, and E,_;. Moreover, the processes have been time-homogeneous, 
that is to say, the transition probabilities P;,(é) have been the same 
for all time intervals of length 4. We now consider more general proc- 
esses in which both assumptions are dropped. 

As in the theory of ordinary Markov chains, we shall permit direct 
transitions from any state Æ; to any state En. The transition prob- 
abilities are permitted to vary in time. This necessitates specifying 
the two endpoints of any time interval instead of specifying just its 
length. Accordingly, we shall write Pin(r, t) for the conditional prob- 
ability of finding the system at time t in state En, given that at a previous 
instant r the state was E;. The symbol Pin(7, t) is meaningless unless 
7 <t. Ifthe process is homogeneous in time, then P;,(r, t) depends only 
on the difference £ — r, and we can write P;,(t) instead of P,,(r, r+t) 
(which is then independent of r). 

The principal property of our processes is the Markov property dis- 
cussed in chapter XV, section 10: Given the state of the system at any 
time, future changes are independent of the past. More precisely, 
consider three moments 7 < s < ¢ and suppose that at time 7 the sys- 
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tem is in state F; and at time s in state E,. For an arbitrary process 
the (conditional) probability of finding the system at time ¢ in state 
E,, depends on both i and »; in other words, not only the “present 
state” E,, but also the past state #;, has an influence on the state at 
time ż¿ However, for a Markov process this is not so. For it the 
considered probability equals P,,(s, t), the probability of a transition 
from F, at time s to E, at time ż; the knowledge that at time 7 < s 
the system was in state E; permits no inference about the future. 
This assumption leads directly to an important conclusion. The pas- 
sage from E; at time 7 to E, at time ¢ must occur via some state F, at 
time s, and for a Markov process the probability that the passage goes 


via a particular state E, is P,,(r, 8)Pyn(s, t). It follows that we must 
have 


(9.1) Pin(t, i) = X Pal, 8)Pon(s, t) 


identically for allr < s < t. This is the Chapman-Kolmogorov equation. 
It is the counterpart, for the case of a continuous time parameter, to equa- 


tion XV(10.3), which is valid when the time parameter assumes integral 
values only. 


It was shown in chapter XV, section 10, that the Chapman-Kol- 
astic processes. For our 
lass of processes with which 


l ; it; once (9.1) is given we can easily derive 
differential equations which determine the 


proceed in a purely analytical way. 


In the case of time-homogeneous processes, equation (9.1) assumes 
the simpler form 


(9.2) Pinlt + 8) = D Pi(t)Pyn(s). 


For the Poisson process this 


equation reduces to the convoluti Op- 
erty of the Poisson distribut: carey 


ion [example XI(2.c)]. 


a 
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We now introduce our fundamental regularity conditions which in 
an obvious way generalize the starting assumptions of the birth and 
death process. 


Assumption 1. To every state E, there corresponds a continuous 
function c,(t) > 0 such that ash — 0 


(9.3) La EE aai, 
h 

The probabilistic interpretation of (9.3) is obvious; if at time ¢ the 
system is in state En, the probability that during (t, +h) a change 
occurs is c,(t)h + o(h). Analytically, relations (9.3) require that 
Paa(t, s) — 1 as s — t, and that Pyn(t,-z) has at z = £ a derivative. 
The function c,(¢) plays the role of An + u, in the birth and death 
process. In the case of a time-homogeneous process, ĉn is independent 
of t. 

Assumption 2. To every pair of states E;, Hy, with j = k there corre- 
spond transition probabilities p;x(t) (depending on time) such that as 
h-0 ~ 


Palt th 
(9.4) Pat > aaO Ge. 


The pj,(t) are continuous in t, and for every fixed t, 5 
(9.5) È piel) = 1, pii(t) = 0. 
k 
Here p;z(t) can be interpreted as the conditional probability that, af 


a change from 4; occurs during (t, +A), this change takes the system 
from Æ; to Ex. In the birth and death process 


j By 
i Bia) = ' 
Aj + uj Ki Aj + Hj 
and p;;(¢) = 0 for all other combinations of j and k. For every fixed t 
the pj,(t) can be interpreted as transition probabilities of a Markov 
chain. 

The two assumptions suffice to derive a system of backward equa- 
tions for the Pj,(z, t), but for the forward equations we require in 
addition 

Assumption 3. For fixed k the passage to the limit in (9.4) is uniform 
with respect to j. 


(9.6) Pri) = 
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The necessity of this assumption is of considerable interest for the 
theory of infinite systems of differential equations and will be discussed 
in the next section. 

We proceed to derive differential equations for the P,z(r, t) as func- 
tions of ¢ and n (forward equations). From equation (9.1) we have 


(9.7) P(r, t+h) = Desay P(t, (+h). 
J 


Expressing the term Pzz(t, +h) on the right in accordance with (9.3), 
we get 


Parl, tH) — Pal, 0) 


(9.8) h 


—ck(t)Piz(r, t) + 
1 
+= È Pal, Pelt, t+h) +... 
h jak 


where the neglected terms tend to 0 with h, and the sum extends over 
all j except j = k. We can now apply (9.4) to the terms of the sum. 
Since (by assumption 3) the passage to the limit is uniform in j, the 
right side has a limit. Hence also the left side has a limit, which means 
that P;r(r, t) has a partial derivative with respect to ¢, and 
oP. lT, t) J 

ð 


(9.9) t = —cr(t)Pir(r, t) + D P(r, d)e;(t)p;4(t). 


infinite system of functions P,,(r, ),k=0,1,2 Th 
t and 7 appear only in the initial con dition 14, parameters 
(9.10) pO 1 for k=7 

0 otherwise. 


A system of backward equations i 
can b imi i 
and the derivation is actually si th ce ae! nh 


3 y simpler since i ‘ 
sumption 3 entirely. As for op nce we can dispense with as- 


to use the forms equations (9.3) and (9.4), it is more natural 
1 — Pan(t—h 
(0.30) I= Palih 0) 
h =E Cn(t) 
(9.40) Pi(t—h, t) 


m Oe) (j = k). 
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These relations can be shown to be equivalent to (9.3) and (9.4), but 
we shall simply start from (9.32), (9.4a), and (9.5) as our basic assump- 
tions. Rewriting the Chapman-Kolmogorov equation (9.1) in the form 


(9.11) Pi(r—h, ) = X Palt—h, 7)P,x(r, t) 


and using (9.3a) with n = i, we get 


Pu(t—h, i) — Pal, ) _ 


(9.12) A = —¢:(7)Px(r, t) + 


1 o(h 
+E Peek Pais 

Rsi h 
Here h`!Pi(r—h, 7) — c:(T)Pi(rT) and the passage to the limit in the 
sum to the right in (9.12) is always uniform. In fact, if N > i we have 


(9.18) OSA È Palr—h,T)Palr, i) SA DY Palr—h, 7) < 
v=N+1 P v=N+1 Pe 


<h- È Palah, 7)}) > ex(z){1 E pa (r)}. 
v=0 v=0 


In view of condition (9.5) the right side can be made arbitrarily small 
by choosing N sufficiently large. It follows that a termwise passage 
to the limit in (9.12) is permitted and we obtain 


(9.14) Matna = a) Pal, 0 — al) E polt)Palr, t. 


This, together with the initial condition (9.10), is the basic system of 
backward. differential equations. 

The two systems of differential equations: were first derived by A. 
Kolmogorov,” who laid the foundations of the theory of Markov proc- 
esses. It has been shown ™ that there always exists a common solution 
{Pix(r, t)} of the two systems which satisfies the Chapman-Kolmogorov 
equation (9.1) and 


(9.15) Pat, ) >20 È Pal, i) <1. 
k 


We know from the pure birth process (section 4) that the P;z(r, t) need 
not add to unity, the difference 1 — 2P;x(7, t) accounting for the pos- 


19 Uber die analytischen Methoden in der Wahrscheinlichkeitsrechnung, Mathe- 
matische Annalen, vol. 104 (1931), pp. 415-458. 
30 See footnote 10. 
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sibility of infinitely many transitions within the finite time interval 
(r,t). If 2Px(z, ) = 1, the solution {P,.(r, t)} is unique, but in gen- 
eral different processes may satisfy the same forward and backward 
equation (see section 10). From the point of view of applications, the 
possibility of the inequality ZP,z(r, t) < 1 may be safely disregarded. 


Example. Generalized Poisson process. Consider the case where all 
c;(t) equal the same constant, c;(t) = à, and the Pik are independent of t. 
In this case the pj, are the transition probabilities of an ordinary 
Markov chain and (as in chapter XV) we denote its higher transition 
probabilities by p{?). 

From ¢;() = 2, it follows that the probability of a transition occur- 
ring during the interval (£, t--h) is independent of the state of the sys- 
tem at time ¢ and equals Ah + o(h). This implies that the number of 
transitions within the interval (7, ) has a Poisson distribution with 
parameter A(t — 7). Given that exactly n transitions occurred, the 
(conditional) probability of a passage from j to k is pP. Hence 


(9.16) P(r, t) = eM) E =i ep 


n=0 n! 


(where, as usual, py = Land p® =0 for j = k). It is easily verified 
that (9.16) is in fact a solution of the two systems (9.9) and (9.14) of 


differential equations satisfying the boundary condition (9.10). 
If, in particular, 


(9.17) p=0 for k <j, Pik =fr; for k>j 


(9.16) reduces to the compound Poisson distribution of chapter: XII 
section 1. ‘ 


10. PROCESSES INVOLVING ESCAPES 


The example of the pure birth process (sections 3 and 4) proves that 
the transition probabilities P(t) determined from the Kolmogorov 
differential equations do not necessarily add to unity; it can happen that 


(10.1) È Pal) <1. 
k 


not always 
Processes 
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with these properties are usually called pathological.! With a better 
understanding there came the realization that we are really confronted 
with a simple and natural analogue to the familiar situation in diffusion 
theory. The occurrence of (10.1), no longer appeared disturbing, but 
led to the gratifying discovery that the theory of the Kolmogorov dif- 
ferential equations shares the basic features of diffusion theory. Despite 
the completely different appearance of the basic equations and the ana- 
lytical apparatus involved, we encounter in both theories the same type 
of boundary conditions and other similarities; each theory is better 
understood in the light of the other, and no sharp boundaries can in 
fact be drawn. In this way the theory of Markov processes has 
achieved an unexpected and pleasing internal unity. 

Let us reconsider the simple pure birth process of section 3. The 
system spends some time at the initial state Zp, moves from there to 
EF, stays for a while there, moves on to Ez, etc. The probability Po(¢) 
that the sojourn time in Fo exceeds ¢ is obtained from (3.2) as Po(t) = 
= et. This sojourn time, To, is a random variable, but its range is 
the positive t-axis and therefore formally out of bounds for this book. 
However, the step from a geometric distribution to an exponential be- 
ing trivial, we may with impunity trespass a trifle. An approximation 
to To by a discrete random variable with a geometric distribution shows 
that it is natural to define the expected sojourn time at Eo by 


(10.2) E(To) = f Nt dt = roo. 
0 


At the moment when the system enters E;, the state E; takes over the 
role of the initial state and the same conclusion applies to the sojourn 
time T; at Ej: The expected sojourn time at Ej is E(T;) = dj~*. It fol- 
lows that Ao +17? +... + An” is the expected duration of the 
time it takes the system to pass through Eo, Hi, ..., En, and we can 
restate the criterion of section 4 as follows: 

In order that 2P,(t) = 1 for all t it is necessary and sufficient that 


(10.3) ZE(T;) = 2A; = ©; 
that is, the total expected duration of the time spent at Eo, Ey, Ez, ... 


must be infinite. Of course, Lo(t) = 1 — 2P,(t) is the probability that 
the system has gone through all states before time t. 


2 The counterpart of this section in the first edition was entitled “degenerate 
processes.” 
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In this form the theorem is extremely plausible. If the expected 
sojourn time at Æ; is 277, the probability that the system has passed 
through all states within time 1 + 2-1 + 9-2 +...= 2 must be posi- 
tive. Similarly, a particle moving along the z-axis at an exponentially 
increasing velocity traverses the entire axis in a finite time. 

If the birth process serves as a model of population growth, the 
state En stands for an actual population size n and reaching infinity 
in a finite time expresses a sort of explosion. In this connection (10.1) 
represents indeed a singular anomaly, but for other applications it may 
appear as a regular affair. Geometrically speaking, there is no reason 
to place the states Ep, Ey, Eo, ... at the points 0, 1, 2, ... of the 
z-axis. Imagine instead E, placed at the point x, of the x-axis, where 
0 = To <2 < T2 <... and Zn — 1. The birth process may then be 
pictured as the motion of a “particle” starting at Zo = 0, jumping to 


, 


1 
(10.4) T áf, = < 


Whether or not the point 1 is actually reached in a fini 
asymptotically approached) depends on the convergen: 
over the reciprocal velocity. In the probabilistic 


Considered as transition probabilities the P,, 
be written as Pin(t). The basic differential 
equally to P,x(t) for an arbitrary (but fixed) 2, ai 
equations 


(i) of section 3 should 
equations (3.2) apply 
nd we have the forward 


(10.5) P’io(t) = Pint),  P'alt) = AP a(t) + Ar—Pi ral) 
where z > 0 is fixed and k = E ars 


In (8.4) and (8.5) we have the 
corresponding backward equations 


(10.6) P'a() = —d;Pig(t) + MP i41,4(t) 


kd 
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where k > 0 is fixed and i = 0, 1, 2, .... The initial conditions are 

the obvious ones: 

(10.7) P;;(0) = 1, P;(0) = 0, k Fi. 
A glance at (10.5) shows that P,o(¢) is determined uniquely by the 

first differential equation together with the boundary condition (10.7). 

We can then calculate P;,(t) successively for k = 1, 2, .... How to 

solve a first order linear equation is well known, and we have the easily 


verified formula for the unique solution of the forward equation (10.5) 
with the initial condition (10.7): 


Pu) =0 for k <i, Pali) = et 


(10.8) 
P(t) = Mi f e™* P; x(t — 8) ds, (k >i). 
o 


The situation is completely different for the backward equations 
(10.6). We shall show that when En < œ there is no uniqueness 
of the solution. 


Lemma. The unique solution P,<t) of the forward equations (10.5) 
given by (10.8) is automatically a solution of the backward equations (10.6). 
If Px(t) is any non-negative solution of (10.6)—(10.7) then 


(10.9) Palt) > Pali). 


Proof. Consider (10.6) putting a bar over all Pix(t). The ith equa- 
tion may be solved as a linear differential equation for P;,(t) to obtain 


t 
P(t) =; f eB, a(t — 8) ds kei 
0 


(10.10) ‘ 


Py(t) = e+ mf e™*. Pri r(t — 8) ds. 
0 


[Note that this is not a recursive system and cannot be used to solve 
the system of equations (10.6).] 
Let P(t) stand for the solution of the forward equations given by 


(10.8). For each k and i > k put P(t) = P(t) = 0. These func- 


tions satisfy (10.10). Furthermore (10.10) defines P;,(f) = e™ = 
= P;,(t). Letting in (10.10) successively i = k—1, k—2, ..., we get 
P(t) defined for all ¿ and they are a solution of the backward equa- 
tions (10.6) with the initial conditions (10.7). Clearly P,_1,2(t) = 
= Px_1,4(t). We shall verify by induction that P(t) = Pad). Sup- 
pose that this identity holds for all combinations 7, k such that k — i < 
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<r where r > 1 is an integer (we know this to be true for r = 1), 
andletk —i=r +1. The integral in (10.10) expresses P;;, in terms 
of Pri = P341,, and (10.8) in turn expresses P41, as an integral 
involving Pzis,1 = Piz1.,-1. We get thus Pir as a double integral 
over P;,1,:-1. Reversing the order of integration we get 


t i-z 
Pal) = aea f a def e™*Pisina(t — z — 8) ds = 
0 0 
(10.11) 
t 
= waf e™*P; (t — x) da. 
0 


By the induction hypothesis P;,_1(#) = P;,.1(), and a comparison 
of (10.8) and (10.11) proves that P;.(t) = P(t) as asserted. 

It remains to prove that (10.9) holds for an arbitrary solution 
P(t) > 0 of the backward equations (10.6). Now both P,,(t) and 
Piz(t) satisfy (10.10). For i > k we have P(t) > P(t) = 0. Let- 
ting successively ¿ = k, k—1, k—2, ... we find that (10.9) holds for 
all 7, k and the lemma is proved. 

We can now sum up the situation in the following way. Two con- 
tingencies can arise. 

(a) The case Z),—' = «©. We know from section 4 that in this case 
> Palt) = 1. It follows that any other positive solution of the back- 

k 


ward equations necessarily adds to more than unity, which is inadmis- 
sible. Accordingly, in this case we have the uniqueness for the admis- 
sible solutions both of the forward and the backward equations. The 
common solution represents the transition probabilities of a birth process 
such that ZP3,(t) = 1. (It is easy to verify by differentiation that 
the Chapman-Kolmogorov equation (9.1) holds.) 


no The case Z\,~* < œ We know that in this case EPalt) < 1. 
en 


(10.12) Li) =1— = Pali) 
k=0 


is the probability that, starting from E;, “infinity” i 7 
8 J i y 18 reached before time t. 
We know that (10.10) is satisfied by Pa(t) = Pa(t) and by summation 


we see that 
t 

(10.13) Lit) =r, Í ML (t — 9) ds 

or 


(10.14) DD = NLO + NLO L,(0) = 0. 
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It follows that in this case the infinite system of differential equations 
(10.14) has a non-zero solution {L;(t)} with L;(0) = 0. With arbitrary 
A,(t) the matris 


(10.15) Px(t) = P(t) + LiAl) 


is a solution of the backward equations (10.6) satisfying the initial condi- 
tions (10.7). 

The question arises whether the A;(¢) can be defined in such a way 
that the P(t) become transition probabilities satisfying the Chapman- 
Kolmogorov equation (9.1). The answer is in the affirmative. We 
refrain from proving this assertion but shall give a probabilistic inter- 
pretation. 

The P,x(t) define the so-called absorbing barrier process: When the 
system reaches infinity, the process terminates. Doob ™ was the first to 
study a return process in which, on reaching infinity, the system instan- 
taneously returns to Eo (or some other prescribed state) and the proc- 
ess starts from scratch. In such a process the system may pass from 
Eo to Es either in five steps or in infinitely many, having completed 
one or several complete runs from Eo to “infinity.” The transition 
probabilities of this process are of the form (10.15). They satisfy the 
backward equations (10.6) but not the forward equations (10.5). 

This explains why in the derivation of the forward equations we were 
forced to introduce the strange-looking assumption 3, which was un- 
necessary for the backward equations: The probabilistically and intui- 
tively simple assumptions 1-2 are compatible with return processes, 
for which the forward equations (10.5) do not hold. In other words, 
if we start from the assumptions 1-2 then Kolmogorov’s backward equa- 
tions are satisfied, but to the forward equations another term must be 
added.” 

The pure birth process is admittedly too trite to be really interesting, 
but the conditions as described are typical for the most general case of 
the Kolmogorov equations. Two essentially new phenomena occur, 
however. First, the birth process involves only one escape route out 
to “infinity” or, in abstract terminology, a single boundary point. By 
contrast, the general process may involve boundaries of a complicated 
topological structure. Second, in the birth process the motion is di- 
rected toward the boundary because only transitions En — En+1 are 


2 J. L. Doob, Markoff chains—denumerable case, Transactions American Mathe- 
matical Society, vol. 58 (1945), pp. 455-473. 

23 Its form is given in the more recent paper cited in footnote 10, where the 
various types of processes and the appropriate boundary conditions are studied. 
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possible. Processes of a different type can be constructed; for example, 
the direction may be reversed to obtain a process in which only transi- 
tions E,4; —> En are possible. Such a process can originate at the 
boundary instead of ending there. In the birth and death process, 
transitions are possible in both directions just as in one-dimensional 
diffusion. It turns out that in this case there exist processes analogous 
to the elastic and reflecting barrier processes of diffusion theory, but 
their description would lead beyond the scope of this book. 


11. PROBLEMS FOR SOLUTION 


1. In the pure birth process defined by (3.2) let A» > 0 for all n. Prove that 
for every fixed n > 1 the function P,(¢) first increases, then decreases to 0. 
If t» is the place of the maximum, then t; < tz < t3 <.... Hint: Use induction; 
differentiate (3.2). 

2. Continuation. If Z\,—! = œ show that t, —> œ. Hint: If tn — 7, then 
for fixed t > 7 the sequence \,P,,(t) increases. Use (4.10). 

3. The Yule process. Derive the mean and the variance of the distribution 
defined by (3.4). [Use only the differential equations, not the explicit form 
(3.5).] 

4. Pure death process. Find the differential equations of a process of the 
Yule type with transitions only from E,, to Ep—ı. Find the distribution P,,(t), 
its mean, and its variance, assuming that the initial state is 7. 

5. Parking lots. In a parking lot with N spaces the incoming traffic is of 
the Poisson type with intensity À, but only as long as empty spaces are avail- 
able. Find the appropriate differential equations for the probabilities P,,(t) 
of finding exactly n spaces occupied. 

6. In a waiting line the customer who came last is serv 
propriate differential equations for the probabilities P. 
comers will be served during the waiting time of a cust 


7. The Polya process. This is a non-stationary pu. 
depending on time: 


ed first.’ Find the ap- 
n(t) that exactly n new- 
omer picked at random. 


re birth process with An 


(11.1) a(t) = 7. 


Show that the solution with initial condition P,(0) 
Pot) = (1 + atje 


= lis 


(11.2) 


Pt) = (1 + at)—r—1/0 (l +a(1 + 2a)---{1 + (n — 1)a} 
Š -r 


1188-1189. 
50. Lundberg, On random 


se Processes and their applications to sickness and acci- 
deni statistics, Uppsala, 1940. 
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Show from the differential equations that the mean and variance are ¢ and 
i(1 + at), respectively. 

8. Continuation. The Polya process can be obtained by a passage to the 
limit from the Polya urn scheme, example V(2.c). If the state of the system 
is defined as the number of red balls, then the transition probability E, > Ep41 
at the (n+-1)st drawing is 
(11.3) __Ttk p + ky 


where p = r/(r + b), y = c/(r + b). 

As in the passage from Bernoulli trials to the Poisson distribution, let 
drawings be made at the rate of one in time h and let h — 0, n — © so that 
np — t, ny — at. Show that in the limit (11.3) leads to (11. 1). Show also 
that the Polya distribution V(2.3) passes into (11.2). 

9. Linear growth. If in the process defined by (5.7) A = u, and P,(0) = 1, 
then 


(1.4) Pdi) = Biff) am — Oe 


At 
MEE (+P 


The probability of ultimate extinction is 1. 


10. Continuation. Assuming a trial solution to (5.7) of the form P,(t) = 
= A(t)B(t), prove that the solution with P,(0) = 1 i 


(11.5) Polt) = 2B), Palt) = {1 — ABO} {1 — uB HABE) 
with 


— e(A—n)t 
(11.6) === 


u — e0 


11. Continuation. The generating function P(s, t) = 2P,(t)s" satisfies the 
partial differential equation 


aP a OP 
(11.7) T {u — A + u)s + As?) ETI 


12. Continuation. Let M(t) = En?P,(t) and M(t) = EnP,(t) (as in section 
5). Show that 


(11.8) M'(t) = 2 — u) M(t) + (A + 2) MO). 
Deduce that when À > p the variance of {Pn} is given by 
(11.9) eM — e@ MH + 4)/A— 2). 


13. For the process (7.2) the generating function P(s, 2) = =P,(t) s” satis- 
fies the partial differential equation 


(11.10) ena =) CA 
Its solution is 
P= ea) Ge) nf 1 — (1 — s)e*#}§, 
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"or 7 = 0 this is a Poisson distribution with parameter x —e)/u. As 
re i the distribution { P,(t)} tends to a Poisson distribution with parameter )/p. 


14. For the process defined by (7.26) the generating function P(s, t) = =P,(t)s" 
satisfies the partial differential equation 


Ore. 
(11.11) (u + As) ae a\P, 


with the solution P = {(u +ds)/(A + p)}*. 


15. In the “simplest trunking problem,” example (7.a), let Q,(t) be the 
probability that starting from E, the system will reach Ey before time t. 
Prove the validity of the differential equations 


11-12) VA) = —A + nu) + Qn a) + nuQ,—r(t), (n> 2) 
1. 

: QO = =A + uR) + AQ) + L 
with the initial conditions Q,(0) = 0. 

16. Continuation. Consider the same problem for a 
arbitrary system of forward equations. Show that th 
responding backward equations (for fixed k) with Pot) 

17. Show that the transition probabilities of the 
those of the birth and death Process satisfy the Chap 
tion (9.1), 


18. Let Pi(t) satisfy the Chapman-Kolmogorov equation (9.1). Supposing 
that P(t) > 0 and that SQ = Ð Pat) < 1, prove that either S,(t) = 1 for 


process defined by an 
e Q,(t) satisfy the cor- 
replaced by 1. 

pure birth process and 
man-Kolmogorov equa- 


all ¢ or S\(t) < 1 for all t. 


19. Ergodic properties. Consider a stationar 


States; that is, suppose that the system of differential equations (9.9) is finite 
and that the coefficients cj and py 


are linear combinations of expo: 
is negative unless À = 0, Conclude that the 


transition probabilities is the same as in the case of finite Markov chains except 
that the periodic case is impossible, 


am 


Answers to Problems 


CHAPTER I 


1. (a) $; (b) 3; ©) To- 

2. The events S;, S2, Sı U Sz, and S,S2 contain, respectively, 12, 12, 18, 
and 6 points. 

4. The space contains the two points HH and TT with probability }; the 
two points HTT and THH with probability $; and generally two points with 
probability 2—" when n > 2. These probabilities add to 1, so that there is no 
necessity to consider the possibility of an unending sequence of tosses. The 
required probabilities are }§ and 3, respectively. 

9. P{AB} = 4, P{A U B} = 3%, P{AB’} =}. 


12. z = 0 in the events (a), (b), and (g). 
x = 1 in the events (e) and (f). 
az = 2 in the event (d). 
az = 4 in the event (c). 


15. (a) A; (b) AB; (c) B U (AC). 

16. Correct are (c), (d), (e), (f), (h), (i), (k), (D). The statement (a) is mean- 
ingless unless C C B. It is in general false even in this case, but is correct in 
the special case C-C B, AC = 0. The statement (b) is correct if C D AB. 
The statement (g) should read (A U B) — A = A’B. Finally (k) is the cor- 
rect version of (j). 

17. (a) AB’C’; (b) ABC’; (c) ABC; (d) A U B Uc; 

(e) AB U AC U BC; (f) ABC’ U A‘BC’ U A'B'C; 
(g) ABC’ U AB’C U A’BC = (AB U AC U BC) — ABC; 
(h) A’B’C’; (i) (ABC)’. 
18. AU B U C = A U (B — AB) U {C — C(A U B)} = 
= 4 U BA’ U CA'B’. 


CHAPTER II 


1. (a) 263; (b) 26? + 263 = 18,252; (c) 26? + 26? + 264. In a city with 
20,000 inhabitants either some people have the same set of initials or at least 


1748 people have more than three initials. 
2 0414 = 896. For a chess board with n? fields the formula is n°(2n — 2). 


3. 2(2!° — 1) = 2046. 


n(n + 1) ionge L 
a E Or Oa 
43) 
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6. (a) pı = 0.01, p2 = 0.27, pz = 0.72. 


(b) pı = 0.001, p2 = 0.063, ps = 0.432, p4 = 0.504. 


T. Pr = (10),10~". For example, p3 = 0.72, pio = 0.00036288. Stirling’s 
formula gives pio = 0.0003598 .... - 


8. (a) (9/10)*; (b) (9/10); (c) (8/10); (d) 2(9/10)* — (8/10)*; (e) AB and 
AUB. 


9. (b)atn—. 10. 9 + EA = gy. 


11. The probability of exactly r trials is (n — Dra + (n), = n™, 
12. (a) 1/1-3-5 +++ (2n — 1) = 2°n!/(2n)!; (b) (n!)/1-3 --- (Qn — 1) = 


2n 
SEMA ( z) x 
13. On the assumption of randomness the probability that ell of twelve 
tickets come either on Tuesdays or Thursdays is (4) = 0.0000003 .... 


There are only (3) = 21 pairs of days, so that the probability remains ex- 


tremely small even for any two days. Hence it is reasonable to assume that 
the police have a system. 

14. Assuming randomness, the probability of the event is ($)? = 4 appr. 
No safe conclusion is possible. 

15. (90)10 + (100)10 = 0.330476 .... 

16. 251(5!)—55—25 = 0.00209 . ap 


17 An —2%)(n—r—1)!_ A%—r—1 
3 n! n(n — 1) j 
18. (a) sty; (b) sige- 


19. The probabilities are 1 — @)* = 0.517747 ... and 1 — (384 = 
= 0.491404 ..., 


20. (a) (n — N), + (n),. (b) (1 — N/n)". Forr = N 
are (a) 0.911812 .., ; (b) 0.912673 .... Forr = N 
(b) 0.348678 .... 


21. (a)\(1 — N/n)r—1., (b) (2) vr + ((n)y)r. 

22, (1 — 2/n)*—2. for the median 2+ = 0.7n, approximately. 

, the probabilities that three or four 
by the youngest girl are, respectively, 


= 3 the probabilities 
= 10 they are (a) 0.330476; 


breakages are caused (a) by one girl, (b) 
+i ~ 0.2 and 33, ~ 0.05, 


24. (a) 121/12! = 0.000054. () E 
30! 


25. Sea8 C) 12— ~ 0.00035 
26. (a) C 2r EJ sOn as 3) 2r (7) ‘ 


oG) G2) "+(@). 


(2° — 2)12-® = 0.00137 .... 
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‘GDC 
28. p = (AN (Sy) = 2/0. 


_ Gs a) Ga) Gs) _ @ Gs" 2), 


. p= C 

(i3) Ga) Cis (is 
. Cf. problem 29. The probability is 

a OEA 
4 G) bs — +) z ie j 
OG CE aa 
j 2\ /39\ 72 
13) (as) Gs) 

33. (a) 24p(5, 4, 3, 1); (b) 4p(4, 4, 4, 1); (c) 12p(4, 4, 3, 2). 

GG) CG) 
~ 
13 
hand contains a cards of some suit, b of another, etc.) 


35. po(r) = (52 — r)a + (52); pi(r) = 4r(52 — r)a + (52)a; 
pr) = 6r(r — 1)(52 — r)2 + (52); 
polr) = 4r(r — I(r — 2)(52 — r) + (52); pa(r) = (r)a + (52)4. 

36. The probabilities that the waiting times for the first, ..., fourth ace 
exceed r are wi(r) = polr); wa(r) = polr) + pi(r); wa(r) = polr) + prlr) + pal); 
wr) = 1 — par). Next f(r) = wr) — w{r + 1). The medians are 8, 20, 
32, 44. 


OE NETO 
"miso (96E) OP mes 


39. CE (Era). 40. F )eeto. 


ti T2 


N 
a 


i] 
© 


s 


w 
-= 


34. (Cf. problem 33 for the probability that the 


ay, atte tre)! 49. (49), + (62) 


ri!ralra! 
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43. P{(7)} 
P{(6, D} 


P{(5, 2)} 
P{(5; 1, 1)} 
P{(4, 3)} 
P{(4, 2, 1)} 
P{(4, 1,1, 1} 
P{(3, 3, 1)} 
P{(3, 2, 2)} 
P{(3, 2, 1, 1)} 
P{(3, 1, 1, 1, 1)} 
P{(2, 2, 2, 1)} 


P{(2, 2, 1, 1, 1)} 


P{(2, 1,1, 1,1, 1)} 


P{(1, 1, 1,1, 1;1, 
44. Letting S, D, T, 
respectively, we have 
P{228} 
P{20S + 1D} 
P{18S + 2D} 


P{16S + 3D} 
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= 10-1077 = 0,000 001 
= san ign 17 = .000 062 
= ST ae 107 = .000 189 
= Fam mr 0 = 001512 
= nT ar 07 = 000315 
= an rer 107 = .007 560 
= oon meme = 017 640. 
= Fon reg 107 = 005 040. 
= sont Goat 17 = .007 560 
= Son Tee 107 = 105840. 
= sami Tae 107 = .105 840 
= sant Tara 107 = 052920 
= sani Tae 1" = 317520 
= Ten Ta 107 = 317 520, 
D = 31° = LOT = .060 480. 


Q stand for simple, 


double, triple, and quadruple, 


ce = = 0.524 30. 
-DIA ge 388— = 35208. 
= BEES ; ia +365" = 09695. 
= TaaEH : Trami -365 = 01429. 
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P{19S + 17} =F eg 385 = 00680. 
P{17S + 1D +17} = pE ‘ =m .365- = 00336. 
PIAS +40} = aTi ; EREM a a 
P{165 + 2D +17} = BRIST BAB -805 m0006. 
P{18S + 10} -i gH .365—? = 00009. 


45. Let q = E) = 2,598,960. The probabilities are: 
(a) 4/q; ©) 13-12-4-q7 = risi (0) 13:12-4-6:97? = qies; 
1 
(d) 9-48-g7! = rro; (e) 13- CY 4-42.q7) = rhs; 
13 5 12 h. 
(f) (3) + 11-6-6-4-q7) = aves; (g) 13- (5) + 6-43-q—! = HR. 
CHAPTER IV 


1. 99/323. 2. 0.21 .... 3. 1/4. 4. 7/28, 


5. 1/81 and 31/6°. 
6. If Ax is the event that (k, k) does not appear, then from (1.5) 


mee- OCO ~ G) Ga) + 8 Cas) ~ Ge) 
T. Put p= (S): Then Sı = 13 (3) ek ($) (**) T 
Ss = 40- E p. Numerically, Pio = 0.9658; Pm = 0.0341; Pi = 0.0001, 
approximately. 
TETO 


N — k) 
9. p = DEN G P - See II(12.18) for a proof that the two 


formulas agree. X } 
10. The general term is @u:,22k2 --- @Nkws where (kı, ke, .--, ky) is a permu- 
tation of (1,2, .--,N)- Fora diagonal element k, = ». 


n ny (ns — ks)r, 
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14. Note that, by definition, u, = 0 for r < n and un = n!s"/(ns)q. 


n a1 (% — 1) (ns — ks)-—1 
15. tr — ta = È (IF a as 


(ns — 1); ; 
sere, 0 
6) GEG) G)- 4 
1. Use (5) 5 = (7) P5”) 
Pio) = 0.264, Piy = 0.588, Pia = 0.146, Pig = 0.002, approximately. 
18. Use (73) 5 = G) Ga- a) 
Pio = 0.780217, Pin = 0.204606, Pix = 0.014845, 
Pia) = 0.000330, Pi = 0.000002, approximately. 
19. mIN hig -Fyw D pIa. 


20. Cf. the following formula with r = 2. 
21. (rN) lz = G) rN — 2)! — E) PON — 3) +... 


+ (=1)*r%GN — N)! 


n 
ee — re eee, 


25. Use II(12.16) and (124) 
26. Put t Un =A; U 


we U (fn and note that Uy41 = Uy U Ay41: and 
UnAna = (AiAv 41) U U (AwAy 41). 


CHAPTER V 
ia (5) 


(6)s 


7 -3 
3. (a) C E (3) = 0.182... 


35) (is) = O41 .... (j1 — 0.182 — 0.411 = 0.407, approximately. 


10-59 
BR = eg = ORL 2. 


The probability of exactly one ace is 


23\ 5 

10, atk 12 13 
4. (a) 2 25) ~ 50 ) 2 ane 

13 


T 
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6. Hi HS; Bs. ae 9. 0 101-8. 
12. E 13. ©) 3; (0 2-1 + 2). 


14. (d) Put an = In — $, bn = Yn — 4) Cn = Za — 3. Then |an| + [bal + 
+ len| = hf lanyil + Ibne} + lengil}. Hence |an) + lbn] + lcn] in- 
creases geometrically. 

15. p = (1 — pı)(1 — p2) -+ (L — Pn)- 

16. Use 1 — z < e~* for 0 < z < 1 or Taylor’s series for log (1 — z); cf. 
II(8.12). 


Are. 
b+e+r 
19. If the statement is true for the nth drawing regardless of b, r, and c, 
then the probability of black at the (n-+1)st trial is 
b b+c i = b be 
btr b+r+te btr b+rt+e btr 
20. The preceding problem states that the assertion is true for m = 1 and 
alln. For induction, consider the two possibilities at the first trial. 


23. Use II(12.9). 

26. From (5.2) w = 2p(1 — p) < 3. 

28. (a) u?; (b) u? +.w + 07/4; (c) u? + (25u + w? + vw + 2uw)/16. 
33. pu = pz = 2pa = P, Piz = Pa = 2p2s = Q, Pu = Pa = 0, po = 


CHAPTER VI. 


18. 


1. Bq. 2. The probability is 0.02804 .... 3. (0.9) < 0.1, z > 22. 
p A 48 52 
kg L band (1 — 4p)" with p= (G) + ig). Hence z> 263 
and z > 66, respectively. 


5. 1 — (0.8)! — 2(0.8) = 0.6242 .... 
6. {1 — (0.8) — 2(0.8)9}/{1 — (0.8)!°} = 0.6993 .... 


1. E (i) + EA = 0.003954 ..., and EA a = 0.00952 .... 


8. E) {6-8 — 2-12}. 


9. True values: 0.6651 ..., 0.40187 ..., and 0.2009 ...; Poisson approxi- 
mations: 1 — e~! = 0.6321 ..., 0.3679 ..., and 0.1839 .... i 


10. e= So 2/kl = 0.143 -s.e Us e= XX 1k! = 0.080... 
4 


12. e77/100 < 0.05 or z > 300. ~ 

13. e—! = 0.3679 ..., 1 — 2-e-! = 0.264 .... 
14, e-7 < 0.01, z È 5. 15. 1/p = 649,740. 
16. 1 — p” where p = 7(0;A) +---+ p(k; A). 
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18. g? for k = 0; pq for k = 1, 2, 3; and pg? — pg' for k = 4. 


22: È es = C) i (4) tor large n. 


a+b—1 -1 b z a : 
20. Zz oer ` ) pet’, This can be written in the alternative 
bat va tk— 


form pe >> k 3) g, where the kth term equals the probability that 
km0 


the ath success occurs directly after k < b — 1 failures. 
2N —-1-r 


21. z, = N= 


) ` Q-2N+rt1, 


N N = ià, 
22, (a) z = E r27- = N y (2N — 1 "); ©) Use I1(12.6). 
rel 


‘1 N-1 
23. ki = npi, ki = np whence n = kyke/ky. 


A (n— 8 E a)i PAR A ERE: as) 
oe 6) ( na )( Nr oP 


where 8; = ni +...+ ni. 
25. P = piga(Pige + pag). 
31. By the Taylor expansion for the logarithm 


B(0; n, p) = q” = (1 — d/n)" < e> = p(0;A). 
_The terms of each distribution add to unity, and therefore it is impossible that 
all terms of one distribution should be greater than the corresponding terms 
of the other. 
32. There are only finitely 
are greater than e, and the re 
of the binomial distribution. 


many terms of the Poisson distribution which 
maining ones dominate the corresponding terms 


CHAPTER VII 


1. Proceed as in section 1. 2. Use (1.7). 
4. 0.99. 5. 500. 6. 66,400 


T. Most certainly. The inequalities of chapter VI suffice to show that an 
excess of more than eight standard deviations is exceedingly improbable. 
8. (2x0) {pipx1 — pi — p)} >. 


3. &(— $2) = 0.143... 


CHAPTER VIII 
1, B = 21. 
2. z = pu + q+ rw, where u, v, w are solutions of 


ak Be ys 
oF tr) Eo owen ae 


w = pu +w +rw= r, 


A 
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l1- -l 

8. u = ptt (o Hr) HE 
Tga RA 
v= (utra) E vsoto E 


4. Note that P{An} < (2p)", but 
Pil > 1 — aap entana, 


If p = 4}, the last quantity is ~ġłn; if p > 1, then P{A,} does not even tend 
to zero. 
CHAPTER IX 


1. The possible combinations are (0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (2, 0), 
(2, 1), (3,0). Their probabilities are 0.047539, 0.108883, 0.017850, 0.156364, 
0.214197, 0.321295, 0.026775, 0.107098. 

2. (a) The joint distribution takes on the form of a six-by-six matrix. The 
main diagonal contains the elements q, 24, ..., 6g where g = yy. On one 
side of the main diagonal all elements are 0, on the other g. (b) E(X) = $, 
Var(X) = 2%, EY) =x, Var(Y) = 1488, Cov(%, Y) = APE. 

3. In the joint distribution of X, Y the rows are 32—! times (1, 0, 0, 0, 0, 0), 
(0, 5,4, 3, 2, 1), (0, 0, 6, 6, 3, 0), (0405 0, 1, 0, 0); of X, Z: (1, 0, 0, 0, 0, 0), (0, 5, 
6, 10, 0), (0, 0, 4, 6, 1, 0), (0, 0, 0, 3, 2, 0), (0, 0, 0, 0, 2, 0), (0, 0, 0, 0, 0, 1); of 
Y, Z: (1, 0, 0, 0), (0, 5, 6, 1), (0, 4, 7, 0), (0, 3, 2, 0), (0, 2, 0, 0), (0, 1, 0,0). Dis- 
tribution of X + Y: (1, 0, 5, 4, 9, 8, 5) all divided by 32, and the values of 
X + Y ranging from 0 to 6; of XY: (1, 5, 4, 3, 8, 1, 6, 0, 3, 1) all divided by 32, 
the values ranging from 0 to 9. E(X) = $, EY) = $, EZ) =, 
Var(X) = $, Var(Y) = $, Var(Z) = $$% 

4. PIZ =i X =j} = gtp if i>j and = (1 — g)g'p if i =j; no 
other values are possible. P{Z = i} = 2g‘p — q™'p — gtp. 

8. The distribution of V» is given by (3.5), that of Un follows by symmetry. 
mtn forr > s; 


PIX =r, Y =)= NM —s + 1)" Ar a)" tr ae D". 
if r>s, and =N™ if r=s. 
rn? — (r — 1)" 
pn? 
t= 9 —(r— DF 
z=0 if j>roor kr 
eo 
(n + 1)%(n + 2) 


12. P(N =n, K = k} = G) p"—*(qq')*- gp’. 


9. PIX<7,Y2 3) = 


10. z = if j <randk <r. 


if j<r and k=r, or j=r and k<r. 


ll. 


PIN = n} = (1 — 9p')"@p"- 


PIK = k) = arap (7571) oy = p'a”. 
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13. E ES) = Ekpr.n/(n + 1) = #p'g' = (1 = 4) (p + gg)” 


ig ag a pp’ lo 1 k 
1- Q= gpr Egyp 


g. =U =m); LAN 
PEE S E(N) = gy Cov, N) W” 


2 ai 
PE,N) = E . 
14. p = pa + fp; E(X) = po! + gp; Var(X) = pg + qp — 2. 
15. i = BA tare P(X = m, Y = n} = pot 4 grtty with 
m, ni; EY) =2; o? = Xp + gp — 1). 


17. 69) 364"—#3651-", 


18. (a) 365{1 — 364"-365-" — n3647—1-365-7} ; (b) n > 28, 

19. (a) y = n, o? = (n — 1)n; (b) p = (n + 1)/2 o? = (n? — 1)/12. 
20. E(X) = np; Var(X) = npi(l — pı); Cov(X, Y) = —npipo. 
21. —n/36. This is a special case of 20, 


r 


yi ) = PMN -r+k-1), 
25. ial ea Sees sere Var(Y,) -p= C= kF 
26. O1- HOER =N fi- e+) r 


27. Z(1 — p,)*. Put X; = lor0 according as the jth class is not or is pre- 
resented. 
Ti(r2 + 1) Tire: — 1)(r2 + 1) 
28. E(X) = —-—~ Varz) e= AMT 
(x) Ti+ 12 ana . (ri +r = I(r + 7)? 


sae. _ _nbr(b+r+ ne} : 
30. Bi Fe 88) = TT 


serie Cr) re 


r—1 k r 
2 aya P ZP 
PaA p mat) + (=) rosz. 
> — T 
To derive the last formula from the first, put f(g) = 72k- gs D É. 
Using II(12.4), we find that f’(q) = Tq—{1 —q)-" The assertion now follows 
by repeated integrations by part. 


CHAPTER XI 
1. sP(s) and P(s?). 
2. (a) (1 — s)—P(s); (b) (1 — )“'sP@); (c) {1 —sP(s)}/L— s); (d) 
pos + {1 — s—P(s)}/(1 — 8); (e) 3{P(s!) + P(—s))}. va 
3. U(e) = pgs?/(1 — ps)(1 — qs). Mean = 1/pg, Var = (1 — 3p4)/p°g. 


‘a 
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6. A zero is the first, second, third, ... zero and therefore U(s) = 2F*(s). 
7. The generating function is {1 — F(s)}(1 — s)—! = (1 + s)U(s). 
8. The generating function is {4F(s)}* = 2F(s) s~? — 1. 
9. Same generating function. 
10. The kth zero must occur at a trial number 2r < n and the ensuing 
n — 2r trials must not produce a zero. 
11. Use an obvious analogue to (1.6) for the case where P(1) < 1. 
12. Using the generating function for the geometric distribution of X, we 
have without computation 


Pipe =) l za ‘Gora == 


13. P,(s){N — (r — 1)s} = P,-1(s)(N — r — 1)s. 
s 1 2s rs 
N—(N—ls N- (N-2s N—(N—n)s- 


15. S, is the sum of r independent variables with a common geometric 
distribution. Hence 


Pe G = =) Pow =P" Ç uA i 7 ’) 


16. PIR =r} = EPs. = k}P{X, > v — k} = 


= Fey (487?) 9+ = pe (te 


km0 y—1 


14. P,(s) = 


w 
ER =14+5, VaR) a 


a (k-1 j 
21. un = q" + È (7 g °) Pun With w = 1, u = g i = g, i = 


= p? + q°. Using the fact that this recurrence relation is of the convolution 

type, 

1 (ps)? 

(1 — gs)* 
22. Un = PWn—1 + QUn—1; Vn = PUn—-1 + Wai, Wa = PYn-1 + QWn_. Hence 

U(s) — 1 = psW(s) + qsU(s); V(s) = psU(s) + gs-V(s); We) = psV(s) + 

+ gsW(s). 


U(s) = 


U(s). 


CHAPTER XIII 


1. It suffices to show that for all roots s Æ 1 of F(s) = 1 we have l| >1, 
and that |s| = 1 is possible only in the periodic case. an 


2n J i : 
2. ton = Z) z ~(nn)-¥. Hence & is persistent only for r = 2. 


For r = 3 the tangent rule for numerical integration gives 


Emmet foe Glad 
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Hence by (3.5) the probability of & ever Occurring is, approximately, z = 3. 
A more precise evaluation of the sum is 0.47 and leads to z = 0.32. 

3. Sa = 0 is possible whenever n = Ka + b), and the binomial distri- 
bution shows that for such n we have PIS, =O} ~ (a + b)X2rabk)—. The 
series diverges. ; 

4. From Zf: + P{X, > 0} <1 conclude that f < 1 unless P{X; > 0} = 0. 
In this case all X} < 0 and & occurs at the first trial or never. 


5. Let u ..., uy be given and Un = Piln—1 + Pauna +... + PNUn—n for 
n>N. Then 


lim u, = “PN + up + py) +...4 un(pı + p: +...+ PN), 
i Pit 2p: + 3p: F... 
If ux = N—! then lim Un = N—(u, + 2u +.. -+ Nux). 
6 U= EHU, Fi = È GPH = sP% 


T. F(s) = $8f1 + Fy(s)} = s—F(s). 

8. Us(s) = {1 — Fas) = 4 + 2(1 + s)(1 — 8), This shows the prob- 
ability of a first Passage at time 2n through a positive point to equal 4 the 
probability of So, = 0. 

9. (a) F(s) = gs(1 — pst), = 1 4 Pq", o? = rp- (b) Za =n- N,, 
E(Z,) ~ npg(q + pr) ~, Var(Z,) = nr°pq(q + pr)-2, 

10. U(s) = 1 + gs +.. s+ gsr + qrer(1 — 3) 74 ul = 

1L. Na* ~ (Nu — 714.3)/22.75; a8) — X-i) =} 

12. Ta = Tni — Frae + rna with ro = n=n=]; 
R(s) = (8 + 28*\(8 — 8s + 2s? — 33)-1, Ta ~ 1.444248(1.139680)—7—1, 


g. 


then A(s) is given by (7.5) with P replaced by æ and 4q by l—a. Let B(s) 
and C(s) be the corresponding functions for B- and C-runs, The required 
generating functions are F(s) = 1 — U-\s), where in case (a) U(s) = A(s); 
in (b) U(s) = A(s) + B(s) — 1; in (c) U(s) = A(s) + Bis) + C(s) ~ 2, 

15. Use a straightforward combination of the method in example (8.b) and 


H in = ya, u(0) = Npg. 
+ Note that 1 — F(s) = (1 798) and u — Q(s) = (1 — R hence 
Q(1) = u, 2R(1) = 02 — +p’. The power series for Q-\s) = Sig ee 
converges for s = ], 


CHAPTER XIV 


1. The probability of ruin is still given b i = TUF 

ro y (2.4) with p = a(1 — y)-, 

g=B1—y)—. The expected durati i — y) with D, 
nb aT uration of the game is D,(1 1) wi 

2. The boundary conditions (2.2) 


a - =1-6 =Q. 
To (2.4) there corresponds the soluti p e e s 


on 


BAO 3): Here y — ô) + ôg/p — 1). 


The boundary conditions (3.2) become Do = 6D. 
=o; D, = 0; 
3. To (2.1) there Corresponds q, = P9242 + gga, and q- = \* is a particu- 
lar solution if A = pA? £ g, that is Geet Lor A +A = gp“ The prob- 


Yy 
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ability of ruin is 
if q2 2p 


£ 
a: = (6+9 -3 ii g2 2p. 


5. Ww:n41(£) = PW:+1,n(T) + Gvz-1,n(x) with the boundary conditions (1) 
Wo,n(T) = Wa,n(z) = 0 for n > 1; (2) we,o(z) = 0 for z ¥ x and w.o(z) = 1. 
6. Replace (1) by wo,n(z) = Wiyn(z) and Wa,n(t) = Wa—1,n()- 


10, PiM, <2] = 3 Gaga —taew 


P{M, = z} =P{M, < z + 1} —P{M, < 2}. 
11. The fint passage through x must have occurred at a time k < n, and 
the particle returned from z to z in the following n — k steps. 


CHAPTER XV 


1. P has rows (p, 4, 0, 0), (0, 0, p, 9), (P, 4, 0, 0), and (0, 0, p, g9). Forn > 1 
the rows are (p°, pq, Pq, 9°). 

2. (a) The chain is irreducible and ergodic; piè — 4 for all j,k. (Note 
that P is doubly stochastic.) (b) The chain has period 3, with G; containing 2, 
and EH»; the state E4 forms Gz, and Es forms Gs. We have u = uz = 3, 
uz = u, = 1. (c) The states E, and Es form a closed set Sı, and E4, Es another 
closed set Se, whereas Ez is transient. The matrices corresponding to the 
closed sets are two-by-two matrices with elements 3. Hence pf — 3 if E; 
and Ep belong to the same S,; pP — 0; finally p% — 3 if k = 1,3, and 
ps? — 0 if k = 2,4,5. (d) The chain has period 3. Putting a = (0, 0, 0, 3, 
2,4, b = (1,0,0,0, 0,0), ¢ = (0,3, 4,0,0,0), we find that the rows of 
P? = P’ =... area, b, b, c, c, c, those of P? = P’ =... are b, c, c, a, a, a, 
those of P = Pt = ... are c, a, a, b, b, b. 

3. pP = (9/8)", BE = (k/6)" — (Œ — 1)/6)* if k >j, and pẹ = 0 if 


k<j: 
4 n= Ghb u= G h i D \ 
6. Put p = Dnp,. The states are null states if p = œ. Stationary dis- 


tribution: uz = (pe + Pepa +--+) i 
7. Ergodic if D1 — qo)(1 — gı) --. (1 — Gna) < ©. Stationary distri- 


bution proportional to the terms of the series. 
8. ur = (p/d — P)/P-_ 
9. u, = {1 — p/ah(p/a)"* + {1 — (p/0)°}. 


10. py = JN —D/N%, Ppisa = (N —A/N?, Pij- = P/N, 


N\2 CH 
uy = + : 
N 

G) q P EAE E) 
O70" J 0 0 
0001 00 
13. P= a, ee a a ie aye: 
0 0 O70. 0 i 
gp 0 0 0 0 
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14. Note that the matrix is doubly stochastic; use example (6.5). 

15. Put Pkk} = 1 fork = 1,...,N—1, and Pyk = pz. 

16. Bupi = uz, then U(s) = uo(1 — s){P(s) — s}—!. For ergodicity it is 
necessary and sufficient that P’(1) < 1. 

23. Let M be the maximum of zj. Consider the states E, for which z, = M. 

26. If N > m — 2, the variables X‘ and X are independent, and hence 
the three rows of the matrix p{f*” are identical with the distribution of X™, 
namely, (3,2, 4). For n= m-+1 the three rows are (3, 3,0), (3, 3,3), 


(0, 3, 2) 
CHAPTER XVII 


3. E(X) = te; Var(X) = ie — 1), 
4, P’, = —\nP, +A(n + 1)Paai- 


P, = G) CNA (ay <4), 


E(X) = ie; Var(X) = te"(1 — e), 
5. P'a) = —( + nu) Palt) + Paral) + (n + 1)uPayalt) for n< N— 
and P'y() = —NuPy(t) +dPy_x(t). ea) sa ae 
6. Birth and deaths process with \, = r, 
19. The standard method of solvin, 
system of linear equations. 
XVI. 


Bn = BL. 
g linear differential equations leads to a 
Cf. the hint contained in footnote 3 of chapter 


ae eee 
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Index 


Absolute probabilities 106; —in Markov 
chains 349, 373. 

Absorbing barriers 311, 329, 341. 

Absorbing boundaries in stochastic proc- 
esses 433, 

Absorbing states 349, 362. 

Absorption probabilities: in birth and 
death process 409, 410; in diffusion 
327, 336; in Markov chains 362ff., 
378, 392ff.; in random walk 313, 329, 
335 (in generalized random walk 331). 
[Cf. Duration of game; Extinction.] 

Acceptance cf. Inspection sampling. 

Accidents: distribution of damages 270, 
398; models involving Bernoulli 
trials with variable probabilities 216; 
models with urns 109, 111; occu- 
pancy models 10; Poisson distribu- 
tion 147; statistics of bomb hits 150. 

ADLER, H. A., and K. W. Miter 420. 

Aftereffect, urn models fop — 109. [Cf. 
Markov property.) 

Age distribution 294ff., 309; (example 
involving ages of a couple 13, 16). 
Aggregates, self-renewing 284, 294, 309. 

Aging, absence of — 305, 411. 

ANDERSEN, cf. SPARRE ANDERSEN, E. 

AnprÉ, D. 70, 335. 

Animal populations: recaptures 43; 
trapping of — 160, 224, 269, 276. 

Aperiodic (= not periodic) cf. Periodic. 

Arc sine laws 77, 80, 86; counterpart 
72. 

Arrangements cf. Ballot problem; Occu- 
pancy problem; Ordering. 

Assignable causes 40. 

Atomic bomb 273. 

Average of a distr. = Expectation. 

Averages, moving 371, 379. 

Averaging, repeated 292, 308, 377. 


b(k; n, p) 137. 

BACHELIER, L. 323. 

Backward equations 421, 427, 430, 436. 

Bacteria counts 153. 

Barer, N. T. J. 43. 

Ballot problem 66, 70. 

Balls in cells cf. Occupancy problem. 

Banacu’s match box problem 157, 212; 
variants 160. 

Barriers, classification of 312, 341. 

Barrxy, W. 331, 346. 

Bares, G. E. 267. 

Bayes’s rule 114. 

Bernovuut, D. 236. 

Bernovutu, J. 135. 

BERNOULLI trials: definition 135; infinite 
sequences of — 183; number theo- 
retical interpretation 195. [Cf. Arc 
sine laws; Betting; Billiards; First 
passages; Random walk; Runs in 
Bernoulli trials; etc.) 

BERNOULLI trials, multiple 158, 160, 223. 

BERNOULLI trials with variable proba- 
bilities: definition 205; Poisson ap- 
proximation 263; variance 216, 

Bernsten, S. 117, 173. 

Berrranp, J. 66. 

Beta function 163. 

Belting: ruin problem 313ff.; — in 
games with infinite expectation 235ff.; 
— on runs 183, 197, 303; — systems 
185, 315; three players taking turns 
18, 24, 108, 130, 376. 

Billiards 265. 

Binomial coefficients 32, 48; identities 
for — 61, 85, 102; integrals for — 
325, 337. 

Binomial distribution 136 ; central term 
139, 182; — combined with Poisson 
160, 269, 277; — as conditional distr. 
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in Poisson process 223; convolutions 
of — 162, 252; expectation of — 209, 
252 (absolute expectation 226); gen- 
erating fet. 252; integral representa- 
tions of — 163, 323, 337; — as limit- 
ing form of hypergeometric distr. 57, 
161, and of Ehrenfest model 358; 
norma! approximation to — 168ff.; 
Poisson approximation to — 142, 161, 
176 (numerical examples 98, 143, 
159); — in occupancy problems 34, 
98; tails of — 139, 163, 178, 181; 
variance 214, 216, 252. 

Binomial distribution, the negative cf. 
Negative binomial distr. 

Binomial formula 49. 

Birth and death process 366, 407ff., 435; 
for servicing problems 413, 434. 

Birth process 402, 422, 434; divergent — 
404, 420ff. 

Birthdays: duplications 31, 46, 57, 440; 
expected numbers 210, 224; — as 
occupancy problem 10, 58; Poisson 
distr. for — 94, 144; (combinatorial 
problems involving — 55, 159). 

Bivariate generating functions 261, 277, 
309; — negative binomial 267; 
— Poisson 162, 261. 

Brzuey, M. T. L. 66. 

BLACKWELL, D. 286. 

Blood counts 153; — tests 225. 

BOLTZMANN-MAXWELL statistics 5, 21, 
39, 57f., 91, 103; — as limit for 
Fermi-Dirac statistics 56. 

BonFeRront’s inequalities 100, 131. 

BooLre’s inequality 23, 

Boren, E. 191, 197. 

BOREL-CANTELLI lemmas 188, 

Bosz-Emstew statistics 5, 20, 38ff., 59, 
103. 

Borrema, O., and S. C. Van VEEN 
265. 


Boundary points for stochastic processes 
433. 


Branching processes 272ff., 277. 

Breeding 132, 347, 376, 394. 

Bridge: ace distr. 11, 36, 55, 56, 158; 
composition of hands 33, 55, 56, 90, 
101; conditional probabilities 129; 
definition 8; waiting times 66; — 


illustrating algebra of events 17, 24. 
[Cf. matching of curds; Shuffling.) 
Brother-sister mating 132, 347, 376, 394. 
Brownian motion cf. Diffusion. 

Busy hour 400. 


Canonical decomposition of matrices 380. 

Canretu, F. P. 191; BOREL-CANTELLI 
lemmas 188. 

Cantor, G. 18, 263. 

Cards cf. Bridge; Matching of cards; 
Poker; Shuffling. 

Carcuesipg, D. G. 54, 269; —, D. E. 
Lea, and J. M. THopay 102, 152. 

Causes, probability of 114. 

Cells, distr. of balls in, cf. Occupancy 
problems. 

Centenarians 145. 

Center of gravity 214. 

Central force in diffusion 344. 

Central limit theorem: applications of — 
to combinatorial problems 180, 241; 
— to frequency of decimals 196; 
— to hypergeometric distr. 180; — 
to Poisson distr. 176, 180; — to 
recurrent events 297; — to random 
walks 325; — to runs 180, 300; for 
binomial distr. 173; for Markov 
chains 373; for sums of random vari- 
ables 229, 238ff., 245. 

Chain letters 55. 

Chain reactions 272ff., 277. 

Chains, random, length of 225, 

CHANDRASEKHAR, S. 377. 

Channels cf.. Counters. 

Cuapman, D. G. 43. 

CuapmMan-Kotmogorov eguation: for 
Markov chains 370, 373; for stochas- 
tic processes 424, 436. 

Characteristic equation 332. 

Characteristic values for matrices 384. 

Cuzpysuey, P. L. 219; — inequality 
219, 227. 

Chess problems 53, 101. 

Chi-square test: mentioned in connection 
with tabular material, but not defined. 

Chromosomes: breakages and inter- 
changes of 54, 102, 151, 152, 161, 269; 
explained 121ff. 

Cauna, K. L. 72, 77, 227, 286, 375. 


INDEX 


Ctarge, R. D. 150. 

Classification, multiple 27. 

Closed sets, closures 349. 

Cocuran, W. G. 41 

Coin tossing: as occupancy problem 11, 
46; as random walk 73, 311ff.; distr. 
of leads 68, 77ff.; empirical illustra- 
tion 21, 83; ties in multiple — 289, 
308. [Cf. BernouLuI trials; First 
passages; Random walk; Runs in 
Bernoulli trials.] 

Coincidences = matches 90, 97, 102. 

Collector’s problem 11, 46, 59, 102; 
moments 210, 224, 265. 

Colorblindness as sex-linked character 
126. 

Combinatorial problems: use — of cen- 
tral limit th. 180, 241; — of ran- 
domization 277. 

Combinatorial product 120. 

Combinatorial runs 40, 60, 225; normal 
distr. for — 180. 

Competition problem 175. 

Complementary events 15. 

Composite Markov processes (shuffling) 
372. 

Composition = convolution 250. 


- Compound distributions 268; com- 


pounding the binomial and Poisson _ 


160, 269, 277. 

Compound Poisson distribution and 
process 270, 398, 428; negative bino- 
mial as compound Poisson distr. 271. 

Conditional distribution 204, 223; — ex- 
pectation 209; — probability 104ff. 

Confidence level 176. 

Configurations in occupancy problems 
37, 56. 

Connections to a wrong number 152. 

Contagion 111f., 434. 

Continuity theorem 262. 

vy Convergence, almost everywhere and in 
measure 196, 243. 

Convolutions 250. 

Coordinate space 120. 

Correlation coefficient 221. 

Cosmic rays 11, 404. 

Counters 57; — of type I 279, 294, 308, 
377; — of type II 308; waiting line 
and servicing problems 413ff. 
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Coupon collecting 11, 46, 59, 102; mo- 
ments 210, 224, 265. 

Covariance 215, 222. 

Cramér, H. 149. 

Cumulative distr. function 168. 

Cycles 242. 

Cyclical random walk 343, 386. 

Cylindrical sets 120 


DAHLBERG, G. 129. 

Damage cf. Accidents; Radiation effects. 

Darwın, C. 69. 

Death process 434. 

Decimals: distribution of 195; — of e 
and x 30, 59. [Cf. Random digits.] 

Defective random variables 283. 

Defectives: inspection plans 158, 160, 
223, 331, 345 (blood tests 225); 
Poisson distr. for — 144; (elementary 
problems mentioning — 54, 130). 

“Degenerate processes” 404, 429. 

Delayed recurrent events 293, 352. 

Dem{merec, M. 269. 

DeMorvre, A. 168, 248, 266. 

DeMorvre-LaPuacs limit theorem 172, 
181. 

Density fluctuations 377. 
FEest-model.] 

Density function 168. 

Dependent cf. Independence; Stochastic. 

Derivatives, number of 37. 

Derman, C. 376. 

Descendants: in birth and death process 
402, 407; in branching processes 
272ff., 277; breeding 132, 347, 376, 
394; family relations of — 133; 
— in population and renewal theory 
295, 309; genetical models 121f., 
240, 347. 

Determinants (number of terms contain- 
ing diagonal elements) 101. 

Dice: ace runs 183, 197, 300; distr. of 
scores 201, 214, 229; equalization of 


[Cf. EnREN- 


ones, twos, ..., 281, 289; generating 
fct. 266; — as occupancy problem 
11; Weldon’s data 138; (elemen- 


tary problems 36, 46, 54, 101, 129, 
158, 159, 179, 223, 376). 

Difference equations 314; method of 
images 235; method of particular 
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solutions 314, 319, 332, 334; passage 
to limit 324f., 336, 435; several 
dimensions 329, 337; special — : ab- 
sorbing barriers 365; — Ehrenfest 
model 358; — occupancy problem 
58, 265; — Polya distr. 131; — re- 
flecting barriers 376, 389; — renewal 
equation 290. 

Difference of events 16. i 

Differential equations, Kolmogorov’s: 
backward 427; forward 426; special 
cases 401ff.; uniqueness 431. 

Diffusion 323; absorption and first 
passages 326, 336; Ehrenfest model 
111, 343, 358, 377; — coefficient 
325. 

Drrac-Fermi statistics 5, 39, 56. 

Discrete sample space 17. 

Dishes, test involving breakage of 55. 

Disorder in chance fluctuations 217. 

Dispersion = variance 213. 

Distinguishable 11, 20, 39; two kinds of 
elements 34. 

Distribution: conditional 204, 223; joint 
— 200; marginal — 201; normal — 
164; — function 168, 200; [— of 
balls in cells cf. Occupancy problems}. 

Dones's inspection plan 223. 

Dorsin, W. 374. 

Dome, C. 277. 

Dominant gene 122. 

Domino 53. 

Doos, J. L. 186, 375, 433. 

Dorrman, R. 225. 

Double Bernoulli trials 158, 160, 223. 

Double (bivariate) generating functions 
261. 

Double sampling 223, 331, 346. 

Doubling system 316. 

Doubly stochastic matrices 358. 

Drift 311, 325. 

Drugs, testing of 69, 139. 

Duality principle 70. 

Duration of games: in Markov chains 
378; in ruin problems 317ff., 334; 
in sequential sampling 330, 334. [Cf. 
Extinction; First passages; 
times.] 

Dvorerzxy, A., and T. MorzEin 66. 


Waiting 


INDEX 


& for recurrent events 278, 282, 

e, distr. of decimals 30, 59. 

Efficiency, tests of 69, 139. 

EGGENBERGER, F. 109. 

Eurenrest, P. and T. 111, 343; — 
model for heat exchange and diffusion 
111, 343, 358, 377. 

Eigenvalue, eigenvector 384. 

Ernstew-Bose statistics 20, 38ff., 59, 
103. 

ENSTEIN-WIENER diffusion 323. 

Ersennart, C., and F. S. Swep 40. 

Elastic barrier 312, 334, 341. [Cf. Ab- 
sorbing barriers; Reflecting barriers.) 

Elastic force in diffusion 344, 

Elevator problem 11, 31, 56, 440. 

Erus, R. E. 323. 

Equilibrium, macroscopic or statistical 
356, 409. 

Erpos, P. 80, 198, 286. 

Ergodic properties — of Markov chains 
356ff., 378, 395; — of stochastic proc- 
esses 409, 436. 

Ergodic states 353. 

ErLANG, A. K. 413; —s loss formula 
418. 

Error function 168. 

Escapes in stochastic Processes 149ff. 

Essential states 353. 

Estimation, statistical, from: simple sam- 
ples 176, 211, 223; — repeated sam- 
pling 43; — independent observa- 
tions 160. 

Estimator, unbiased 227. 

Events 8, 13ff.; compatible — 88; inde- 
pendent — 117; simultaneous reali- 


zation of — 16, 89, 96, 99; — in 
product spaces 118ff, 
Evolution 404, 


Exclusive events 15. 

Expectation 207; conditional — 209; 
infinite — 249; — of products 208, 
215, 221; — of reciprocals 224, 226, 
227; — of sums 208. 

Experiments, conceptual 4, 7ff.; com- 
Pound and repeated — 118, 

Exponential distribution 399, 411, 429; 
characterization by a functional equa- 
tion 413. 

Exponential holding times 305, 411. 


INDEX 


Extinction: in birth and death process 
410; in branching processes 274; 
— of genes 124, 274, 365. 

Extrasensory perception (ESP) 54, 368. 


F for failure 135. 

Factorials 29; gamma fct. 63; Stirling’s 
formula 50, 64, 169. 

“Fair” games 233ff., 246, 315; — with 
infinite expectations 236; unfavora- 
ble — 235, 246. 

Faltung = convolution 250. 

Families: problems — on sex distr. 107, 
108, 115, 130, 158, 269; — on dish- 
washing 55. 

Family names, survival of 273. 

Family relations 133. 

Family size, geometric distr. for 130, 274 

“Favorable” cases 23, 26. 

Fermi-Dircc statistics 5, 39, 56. 

Fire cf. Accidents. 

Firing at targets 10, 159. 

First occurrence cf. Waiting times. 

First passages in Bernoulli trials and 
random walks 74, 280, 312; expecta- 
tion 254, 317; explicit formulas 76, 
322, 335ff.; generating fets. 254, 308, 
318, 335ff.; limit, theorems 87, 326, 
336. [Cf. Duration of games; Re- 
turns; Waiting times.] 

First-passage times: in diffusion 226, 
335ff.; in Markov chains 352, 362 
(expectation 378, 395); in stochastic 
processes 436. 

Fish catches 43. 

Fisner, R. A. 6, 44, 138, 274, 347; 
—s logarithmic distr. 269. 

Fission 273. 

Flaws in material 149, 159. 

FoxKeR-PLANCK equation 326. 

Forward equations 422, 426, 431ff. 

Fricuer, M. 88, 101, 375, 377, 380. 

Frequency function 168. 

FrrepMan, B. 109, 343. 

Frosentus’ theory of matrices 375. 

Fry, T. C. 138, 413. 

Furry, W. H. 404. 

Firrn, R. 371; —’s formula 327. 


G.-M. counters cf. Counters. 


Gatton, F. 69, 241, 273. 

Gambling systems 185, 315. 

Gamma function 63. 

Gaussian (= normal) distribution 168. 

Generating functions 248ff.; bivariate — 
261, 277, 309; moment — 267, 277; 
use of — for solving difference eqns. 
318; — for differential eqns. of 
stochastic processes 435; — for 
Markov chains and matrices 380ff. 

Genes and genotypes 10, 121ff., 132; 
inheritance 240; Markov chains 347, 
365; mutations and survival 274, 
365, 403. 

Geometric distribution 156, 223, 304ff.; 
exponential limit 412; — for family 
size 130, 274; lack of memory 304, 
412; — as limit of Bose-Einstein 
statistics 59; — as negative binomial 
distr. 156, 210, 252; — in special 
problems 48, 59, 223, 276; — in 
stochastic processes 435. 

Gončarov, V. 243. 

Greenwoop, J. A., and E. E. STUART 
54, 368. 

Greenwoop, R. E. 59. 

Grouping in Markov chains 379. 

Grouping, tests of 40. 

Guessing 98, 217. 

GumBEL, E. J. 145. 


Haemophilia as sex-linked character 126. 

Harpy, G. H. 124, 196; —’s law 124, 
132. 

Harris, T. E. 276, 376, 379. 

Hausporrr, F. 191, 196. 

Heat exchange, Ehrenfest model for 111 
343, 358, 377. 

HELLY’s theorem 263. 

Higher sums 370. 

Hopces, J. L. 69, 72. 

Hoerrpina, W. 217. 

Holding times, exponential 305, 411. 

Homogeneity, tests of 41. 

Hostinsky, B. 375. 

Hypergeometric distribution 41ff., 55, 56, 
218; approximation of — by binomial 
57, 161; — by Poisson 162; — by 
normal 180; double — 45. 
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Hypothesis 105; statistical — cf. Tests, 
statistical. 


Images, method of 70, 335. 

Implication 16. 

Improper random variable 283. 

Incoming traffic 412. 

Independence, stochastic (= statistical) 
114, 204, 227. 

Indistinguishable 11, 20, 39; two kinds 
of elements 34. 

Inertia, moment of 214. 

Infinite moments 249; limit theorems 
for — 231, 239, 246, 298; — in ran- 
dom walks 83, 87, 326, 336). 

Infinitely divisible distributions 271. 

Inheritance 121ff., 240. 

Initials 53. 

Insect litters and survivors 161, 269, 

Inspection sampling 42, 158, 160, 224; 
sequential — 330, 334, 345. 

Intersection of events 16. 

Inverse probabilities in Markov chains 
373. 

Inversions 241. 

Irradiation, harmful 10, 54, 102, 152, 
161, 269. 

Trreducible chains 349, 

Istne’s lattice model 41. 

Iterated logarithm, law of the 191, 196; 
generalized — 197, 198; — for 
Markov chains 374. 


Kac, M. 54, 80, 111, 343, 391. 
Kaxkutant, S., and K. Yosrpa 378. 
Kartın, S., and J. McGrecor 408, 
Kes vin’s method of images 70, 335. 
Kenna, D. G. 269, 274, 410. 
Kenpatt, M. G., and B. Smita 144, 
Key problem 46, 54, 130, 224. 
Kurntentng, A. 181, 192, 196, 229. 
Kormocoroy, A. 6, 195, 286, 323, 353; 
—s criterion 243 (converse 247) ; 
—’s differential equations 4238. ; 
—'s inequality 220; Chapman-Kol- 
mogorov equations 370, 373, 424, 436. 
Koopman, B. O. 4, 408. 


Ladder points 280, 308.’ 
LAGRANGE, J. L. 266, 322. 


Lartacs, P. S. 168, 248, 377; —’s law 
of succession 113; DeMoivre-Laplace 
limit theorem 172, 181. 

Largest observation, estimation from 211, 
223. 

Larvae 161, "69. 

Latent roots and vectors 384. 

Law of the arc sine 77, 80, 86; counter- 
part 72. 

Law of the iterated logarithm 191ff., 196; 
generalized — 197; — for Markov 
chains 374. 

Law of large numbers, the strong 243, 247; 
— for Bernoulli trials 190, 196; — 
for Markov chains 374. 

Law of large numbers, the weak: for 
Bernoulli trials 141, 181, 198; classi- 
cal forms 228, 238, 244ff.; for de- 
pendent variables 246; generalized 
form (for infinite moments) 236; for 
Markov chains 374; for permuta- 
tions 241; for recurrent events 297. 

Law of rare events or small numbers 149. 

Law of succession 113. 

Leads, distribution of 67. 72, 77ff., 142; 
empirical illustration 83. 

Lepermann, W., and G. E. REUTER 
408. 

Lefthanders 159. 

Lévy, P. 80, 271. 

Lighining, distribution of damage 270, 
398. 

Linpesere, J. W. 229, 239. 

LitrLewoon, J. E. 196. 

Lyapunov, A. 229, 246. 

Logarithm, inequalities and series for 48. 

Long chain molecules 11, 225, 

Loss, coefficient of, 420. 

Loss formula, Erlang’s 418 

Lorra, A. J. 130, 273. à 

Lunch counter example 40. 

LUNDBERG, O. 434. 


McCrea, W. H., and F. J. W. WHIPPLE 
327, 330. 

M’Kenories, A. G. 404. 

McGrecor, J., and S. KARLIN 408. 

Machine servicing 416ff. 

Macroscopic equilibrium 356, 409. 

Mater-Leanrtz, H. 279. 
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Marécor, G. 347. 

Marss, K. 136. 

Marcenau, H., and G. M. Murrey 39. 

Marginal distribution 201. 

Margov, A. 229,338. 

Marzov chains: associated with sto- 
chastic processes 366, 378, 409, 428; 
definition 340; mixtures of — 379; 
superposition of — 372. 

Manxov chains of higher order 376. 

Markov process 368ff., 379; — with 
continuous time 397ff., 423ff. 

Marxov property 305, 369. 

Motch box problems 157, 160, 212. 

Matching of cards 90, 97, 102, 217. 

Mating (assortative and random) 122, 
132; brother-sister — 132, 347, 376, 
394. 

Matriz: canonical decomposition 380; 
— notation 133, 348, 384; parti- 
tioned — 351, 355, 392; stochastic 
— 340 (doubly stochastic — 358; 
non-stochastic — 374, 392ff.). 

Mazima in random walks: distribution 
335; position 86. [Cf. Largest obser- 
vation.] f 

Maximum likelihood 44. 

MAXWELL-BoLTZMANN statistics 5, 21, 
39, 57ff., 91, 103; — as limit for 
Fermi-Dirac statistics 56. 

Mean, cf. Expectation. 

Mediar. 48, 207. 

Memory in waiting times 305, 411. 

MENDEL, G. 121. 

MÉRÉ’s paradoz 54. 

Mıruer, K. W., and H. A. ADLER 420. 

Miszs, R. von 6, 31, 94, 186, 191, 300, 
310. 

Misprints 11; Fermi-Dirac distr. for — 
39, 56; Poisson distr. for — 145, 159. 

Mixtures: of distributions 277; of 
Markov chains 379; of populations 
111ff. 

Molecules, long chain 11, 225. 

Mouna, E. C. 145, 177. 

Moment generating function 267, 277. 

Moment of inertia 214. 

Moments 213; infinite — 249. 

Monrmont, P. R. 90. 

Moon, A. M. 180. 


Moran, P. A. P. 160, 161. 

Morse code 53. 

Mortality 294ff., 309. 

Morzzıy, T., and A. DVORETZEY 66. 

Moving averages 371, 379. 

Multinomial coefficients 35. 

Mullinomial distribution: 157, 203, 224; 
generating fct. 261; maximal term 
161, 180. 

Multiple Bernoulli trials 158, 160, 223. 

Multiple classification 27. 

Multiple Poisson distribution 162. 

Multiplets 27. 

Murpny, G. M., and H. Marcenav 39. 

Mutations 274, 404. 


(n), 28. 

Negation 15. 

Negative binomial, bivariate 267. 

Negative binomial distribution 155, 210, 
252; — in birth process 404; infinite 
divisibility 271; — as limit of Bose- 
Einstein statistics 60, and of Polya’s 
distr. 132; the Poisson distr. as limit 
of — 162, 263. 

Neighbors, unlike 41. 

Newman, D. J. 197. 

Newman, J. 154, 267. 

Non-Markovian processes 370, 379. 

Normal ‘approximation for: binomial 
distr. 168f., 182 (tails 177, 180); 
combinatorial runs 180; hypergeo- 
metric distr. 180; Markov chains 
374; permutations 241,242; Poisson 
distr. 176, 180, 230; random walks 
326; recurrent events 297; success 
runs 300. 

Normal density and distribution 164; 
tails 166, 179. 

Normal numbers 197. 

Normalized random variables 215. 

Nuclear chain reactions 273. 

Null state 352. 

Number theoretical interpretations 195. 


a configurations and numbers 

6. 

Occupancy problems 20, 36; empirical 
interpretations 10; empty cells 58ff., 
9Off., 103, 226, 344, 387; Markov 
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chain treatment of — 344, 387; mul- 
tiply occupied cells 31, 34, 56ff.; 
negative binomial limit 60; Poisson 
limit 58, 94; tables 38, 92, 95, 98, 
440. (Waiting times 45ff., 55, 56, 
210, 265; elementary problems 27, 
53ff., 101, 201, 224.) 

Optional stopping 173, 226, 233. 

Ordering 28; — of two kinds of ele- 
ments 34, 40. 

ORNsTEIN, L. S. 323. 


p(k; A) 146. 

Pairs 26. 

Pau, C. 413, 416, 420. 

Panse, V. G., and P. V. SUKHAT™ME 139. 

Parapsychology 54, 368. 

Parking lots 54, 55, 434; — tickets 54. 

Partial fractions 257, 267; — for 
Markov chains 380ff.; for random 
walk 321ff., 333, 336, 388f.; for runs 
301; numerical examples 260, 295, 
302. 

“Particle” in random walks 74, 311. 

Particular solutions, method of 314, 319, 

_ 882, 334. 

Partitioned matrices 351, 355, 392, 

Partitions, combinatorial 32. 

Pascal distribution 156. 

Paths in random walks 67. 

Pearson, K. 163, 241. 

Pedestrians: as non-Markovian process 
871; — crossing the street 159, 

Periodic recurrent events 284 j; — states 
353, 360. 

Permutations 28, 120, 241, 367. 

Persistent recurrent event 283; — state 
353. 

Petersburg paradoz 235. 

Petri plate 153. 

Phase space 13, 

Photographic emulsions 11, 57. 

Poisson, S. D. 142. 

Poisson approximation or limiting form 
for: Bernoulli trials with variable 
probabilities 263; - binomial distr. 
142, 161, 176 (numerical examples 
98, 143, 159); density fluctuations 
377; hypergeometric distr. 162; 
matching 102ff.; negative binomial 


162, 263; normal distr. 176, 230; 
occupancy problems 58, 94; ran- 
domized sampling 203, 277; runs 
310; servicing and trunking prob- 
lems 414, 435. 

Poisson distribution 146; convolutions 
162, 252; generating fct. 252; inte- 
gral representation 163; moments 
209, 214; normal approximation to 
— 176, 180, 230; observation fitting 
the — 149ff. 

Poisson distribution: bivariate 261; 
compound 269, 398, 428; multiple 
162; spatial 149. (— combined with 
binomial distr. 160, 223, 269, 277.) 

POISSON process 400, 423, 428. 

Poisson traffic 412, 

Poisson trials = Bernoulli trials with 
variable probabilities. 

Poker 8, 33, 57, 102, 159, 441. 


Portarn, H. 286. 
Porya, G. 109, 210, 329; —'s distribu- 
tion 131; — process 434; —’s urn 


model 109, 131, 225, 246, 370, 434. 
Polymers 11, 225. 
Population 32; — stratified 107, 111. 
Population growih 296, 309, 409. [Cf. 
Branching processes; Genes.] 
Positive state 353. 
Power supply problems 138, 418, 436. 
Product, combinatorial 120. 
Product measure and space: 120, 


Quality control 40. {Cf. Inspection sam- 
pling.] 


Radiation effects 10, 54, 102, 152, 161, 
269. 

Radioactive disintegrations 149, 305, 403. 

Railroad competition problem 175. 

Raisins, distribution of 145, 149, 159. 

Random choice 29. 

Random digits (= random sampling 
digits) 11, 30, 59, 92, 143, 176. 

Random mating 122, 132. 

Random sampling cf. sampling. 

Rondom variables 199; defective (= 
improper) — 283; integral-valued — 
248; Markovian — 368; normalized 
— 215; time dependent — 397ff. 
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Random walk: cyclical 342, 386; gen- 
eralized 330, 334, 345, 366; one- 
dimensional 73ff., 311f. (Markov 
chain method 341, with variable 
probabilities 366; renewal method 
336); several dimensions 327. [Cf. 
Absorbing barriers; Elastic barriers; 
Reflecting barriers; Diffusion; First 
passages; Marima; Return to ori- 
gin.] 

Randomization method: in combinatorial 
problems 277; in sampling 203. 

Randomness in sequences 191; tests of 
— 40, 59, 69, 98. 

Range 200. 

Rank order test 69. 

Realization of events, simultaneous 16, 
88ff., 96, 99, 130, 131. 

Recapture in trapping experiments 43. 

Recessive genes 122. 

Reciprocals, expectation of 224, 227. 

Recurrence paradoz 298. 

Recurrence times 284, 309; — of ladder 
points 308; —in Markov chains 352, 
395; — of runs 299ff. [Cf. Return 
to origin; Waiting times.] 

Recurrent events 278, 282; delayed — 
293; Markov chain treatment of — 
344, 359. 

Recurrent states 353. 

Reduced number of successes 172. 


- Reflecting barriers 312, 335, 342, 376; 


explicit solution 388; — in plane 377. 
Reflection principle 66, 70, 335. 
Rencontre (= coincidences) 90, 97, 102. 
Renewal: equation 290ff., 306; — of 

aggregates and populations 284, 294ff., 

309; — method in random walks 336. 
Repairs of machines 416ff. 

Repeated averaging 292, 308, 377. 

Repeated trials 118ff.; random variable 
representation 205. 

Replacement cf. Renewal; Sampling. 

Retrospective (backward) equations 421, 
427, 430, 436. 

Return to origin in Bernoulli trials and 
random walks 74; — by generating 
fets. 256, 265; — by Markov chains 
366; — by recurrent events 280, 287; 
empirical illustration 83; limit theo- 


rems 83, 87; — with variable proba- 
bilities 366; — in several dimensions 
327; nth return 77, 87, 337; number 
of returns 81, 265, 298. 

REUTER, G. E., and W. LeperMann 
408. 

Reversibility 373. 

Rossins, H. 52. 

Romanovsey, V. 375. 

Romie, H. C. 137. 

Ruin problem 334ff. 
probabilities.) 

Rumors, spread of 55. 

Runs, combinatorial 40, 60; moments- 
225; normal distr. for — 180. 

Runs in Bernoulli trials: definition 279; 
Markov chain treatment 344, 348; 
Poisson distr. for long — 310; — in 
billiards 265; — of r successes before 
m failures 183, 197, 303; — in a 
traffic problem 159; theory 299ff., 
310. 

Rutserrorp, E. 160. 

RUTHERFORD-CHADWICK-ELLIS 149, 


(Cf. Absorption 


S for success 135. 

Safety campaign 111. 

Sample average 230. 

Sample point 9, 14. 

Sample space 7, 14; discrete — 17; 
— for repeated trials and experiments 
118ff.; — in terms of random varia- 
bles 205. 

Sampling 28ff., 218, 225; — with and 
without replacement 28, 57, 101, 218; 
inspection — 42, 158, 160, 228; ran- 
domized — 203; repeated — 43, 160; 
required sample size 139, 176, 179, 
230; sequential — 330, 334, 345; 
stratified — 225; waiting times 45, 
102, 210; (elementary problems 53ff., 
203). 

SCHROEDINGER, E. 273. 

Scuwarz’s inequality 227. 

Seeds: Poisson distr. for 149; survival 
274. 

Selection, genetic 128, 132, 274. 

Self-renewing aggregates 284, 294, 309. 

Senator problem 34, 42. 

Sequential sampling 330, 334, 345. 
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Sequential tests 160. 

Sera, testing of 139. 

Servicing factor 416; — problems 413ff. 
(power supply 138, 418, 436). 

Sets: closed (in Markov chains) 349; 
— cylindrical 120. [Cf. Evenis.] 

Seven-way lamps 27. ` 

Sex distribution in families 107, 108, 
115, 130, 158, 269. 

Sex-linked characters 125ff. 

SHewuart, W. A. 40. 

Shoe problems 55, 101. 

Shuffling 367; composite — 372. 

Samenov, N. 180. 


Smrt, BABBINGTON, and M. G. KEN- 
DALL 144. 

SPARRE ANDERSEN, E. 65, 68, 77, 80, 87. 

Spurious contagion 111. 

Stable distribution of order $ 87, 231. 

Stakes, effect of changing 315. 

Standard deviation 213; for normal 
distr. 168; for number of successes 
172. 

Stars, distribution of 159. 

States in Markov chains 340; classifica- 
tion 351ff. 

Stationary distributions: of age 309; of 
genes 124, 132; in Markov chains 
356, 362, 376ff..... 

Stationary transition probabilities 340, 

7869, Ena 

‘Statistical (= stochastic) independence 

"114, 204227. = 

Steady state 356, 409, 

Srumvuavs, H. 157, 

‘Sterilization laws 129, 

Stirling’s, formula 50, 169; alternative 


form 64. is 
Stochastic endence 114, 204, 227, 
Stochastic matriz 348; doubly — 358. 
Stochastic process 368ff., 397. 
Stratification, urn models for 111. 
Stratified populations 107, 111. 
Stratified sampling 225, 
Street crossing, pedestrian 159. 
Struggle for existence 404. 
Sruarr, E. E., and J. 
54, 368. 
Success 135; reduced number of — 172, 
Succession, Laplace's law of 113. 
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Suxusrme, P. V., and Panser, V. G. 139. 

Sums, higher 370. 

Sums of a random number of random 
variables 268. 

Superposition of Markov chains and 
shuffling 372. 

Survival: birth and death process 410; 
branching processes 272; family 
names 273; genes 124, 161, 274, 365. 

Swen, F. S., and C. Ersennarr 40 

Systems of gambling 185, 315. 


Tauberian theorems 306. 

Telephone: holding times 304, 411; 
traffic 152, 264, 400; trunking prob- 
lems 177, 413ff., 436. 

Testing: of blood 225; — of sera 139. 

Tests, statistical: of efficiency 69, 139; 
of grouping 40; homogeneity 41, 180; 
randomness 40, 54, 55, 59, 69, 98; 
rank order — 69; sequential — 160, 

Theta functions 336. 

THORNDIKE, F, 152, 

Ties: in billiards 265; in dice 281, 289; 
in multiple coin tossing 289, 308; in 
random walk 67. 

Time-homogeneous Process 423. 

Traffic, incoming 412. 

Traffic problems 159, 371, (Cf. Tele- 
phone.) 

Transient: recurrent events 283; — 
states 353, 362ff., 131ff., 137ff. 

Transition probabilities: in Markov 
chains 340, 347, 368ff.; in stochastic 
Processes 420, 

Trapping, animal 160, 224, 268, 276. 

Trials, repeated 118; random variable 
Tepresentation 205. 

Trinomial distribution 203, 224; gen- 
erating fet. 261. b 

Truncation method 232, 237, 239, 244, 

246. 
Trunking problems 177, 413ff., 436. 
Turns: in billiar 265; three players 
taking — 18, 2, 108, 130, 376. 


UHLENBECK, G. E. 323, 343. 
Unbiased estimator 227. 
Unessential states 353. 
Uniform distribution 223, 266. 


INDEX 461 


Union of events 16. 

Unlike neighbors 41. 

Urn models: 108ff., 338; Ehrenfest — 
111, 343, 358, 377; Laplace — 113; 
Polya — 109, 131, 225, 246, 370, 434. 


Vaccines, testing of 139. 

Van VEEN, S. C., and O. BOTTEMA 265. 

Variance 213; — calculated from gen- 
erating fct. 250; — of normal distr. 
168; — of sums 215. 

Vavror, E. 434. 

Vourerra’s theory of struggle for exist- 
ence 404. 


Waiting lines: branching process for 274; 
“Jast come first served” 434; Markov 
chains for — 378; several channels 
412, 415, 418, 436; single channel 411, 
416. 

Waiting times: exponential 411; geo- 
metric 304; negative binomial 155, 
210, 252; — in billiards 265; — in 
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collector’s problem 46, 59, 102, 210, 
224, 265; — in combinatorial prob- 
lems 45, 55, 56; — in Markov chains 
352, 362, 378. [Cf. First passages; 
Recurrence times; Return to origin.) 

Warp, A. 41, 160, 180, 233, 313, 331. 

Wane, Mine Cren, and G. E. UHLEN- 
BECK 343. 

Welders problem 138, 420. 

WeLrpon’s dice data 138 

Warre, F. J. W., and W. H. McCrea 
327, 330. 

WuarrwortH, W. A. 26, 57. 

Wisner, N. 323. 

Wo rowi7z, J. 41, 180, 286. 

Wrong numbers, connections to 152. 

Warcnt, S. 347. 


X-rays, effect on cells 10, 54, 102, 152, 
161, 269. 


Yosrpa, K., and S. Kaxuranr 378. 
Yure, G. U. 404; — process 403, 434. 
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